Making Cloudflare the best platform for building AI Agents
2025-02-25
Today we’re excited to share a few announcements on how we’re making it even easier to build AI agents on Cloudflare....
\n
AI co-pilot
A co-pilot acts as an intelligent assistant that can provide hotel and itinerary recommendations based on your preferences. If you have questions, it can understand and respond to natural language queries and offer guidance and suggestions. However, it is unable to take the next steps to execute the end-to-end action on its own.
\nAgent
An agent combines AI's ability to make judgements and call the relevant tools to execute the task. An agent's output will be nondeterministic given: real-time availability and pricing changes, dynamic prioritization of constraints, ability to recover from failures, and adaptive decision-making based on intermediate results. In other words, if flights or hotels are unavailable, an agent can reassess and suggest a new itinerary with altered dates or locations, and continue executing your travel booking.
\nYou can now add agent powers to any existing Workers project with just one command:
\n$ npm i agents-sdk
\n … or if you want to build something from scratch, you can bootstrap your project with the agents-starter template:
\n$ npm create cloudflare@latest -- --template cloudflare/agents-starter\n// ... and then deploy it\n$ npm run deploy
\n agents-sdk
is a framework that allows you to build agents — software that can autonomously execute tasks — and deploy them directly into production on Cloudflare Workers.
Your agent can start with the basics and act on HTTP requests…
\nimport { Agent } from "agents-sdk";\n\nexport class IntelligentAgent extends Agent {\n async onRequest(request) {\n // Transform intention into response\n return new Response("Ready to assist.");\n }\n}
\n Although this is just the initial release of agents-sdk
, we wanted to ship more than just a thin wrapper over an existing library. Agents can communicate with clients in real time, persist state, execute long-running tasks on a schedule, send emails, run asynchronous workflows, browse the web, query data from your Postgres database, call AI models, and support human-in-the-loop use-cases. All of this works today, out of the box.
For example, you can build a powerful chat agent with the AIChatAgent
class:
// src/index.ts\nexport class Chat extends AIChatAgent<Env> {\n /**\n * Handles incoming chat messages and manages the response stream\n * @param onFinish - Callback function executed when streaming completes\n */\n async onChatMessage(onFinish: StreamTextOnFinishCallback<any>) {\n // Create a streaming response that handles both text and tool outputs\n return agentContext.run(this, async () => {\n const dataStreamResponse = createDataStreamResponse({\n execute: async (dataStream) => {\n // Process any pending tool calls from previous messages\n // This handles human-in-the-loop confirmations for tools\n const processedMessages = await processToolCalls({\n messages: this.messages,\n dataStream,\n tools,\n executions,\n });\n\n // Initialize OpenAI client with API key from environment\n const openai = createOpenAI({\n apiKey: this.env.OPENAI_API_KEY,\n });\n\n // Cloudflare AI Gateway\n // const openai = createOpenAI({\n // apiKey: this.env.OPENAI_API_KEY,\n // baseURL: this.env.GATEWAY_BASE_URL,\n // });\n\n // Stream the AI response using GPT-4\n const result = streamText({\n model: openai("gpt-4o-2024-11-20"),\n system: `\n You are a helpful assistant that can do various tasks. If the user asks, then you can also schedule tasks to be executed later. The input may have a date/time/cron pattern to be input as an object into a scheduler The time is now: ${new Date().toISOString()}.\n `,\n messages: processedMessages,\n tools,\n onFinish,\n maxSteps: 10,\n });\n\n // Merge the AI response stream with tool execution outputs\n result.mergeIntoDataStream(dataStream);\n },\n });\n\n return dataStreamResponse;\n });\n }\n async executeTask(description: string, task: Schedule<string>) {\n await this.saveMessages([\n ...this.messages,\n {\n id: generateId(),\n role: "user",\n content: `scheduled message: ${description}`,\n },\n ]);\n }\n}\n\nexport default {\n async fetch(request: Request, env: Env, ctx: ExecutionContext) {\n if (!env.OPENAI_API_KEY) {\n console.error(\n "OPENAI_API_KEY is not set, don't forget to set it locally in .dev.vars, and use `wrangler secret bulk .dev.vars` to upload it to production"\n );\n return new Response("OPENAI_API_KEY is not set", { status: 500 });\n }\n return (\n // Route the request to our agent or return 404 if not found\n (await routeAgentRequest(request, env)) ||\n new Response("Not found", { status: 404 })\n );\n },\n} satisfies ExportedHandler<Env>;
\n … and connect to your Agent with any React-based front-end with the useAgent
hook that can automatically establish a bidirectional WebSocket, sync client state, and allow you to build Agent-based applications without a mountain of bespoke code:
// src/app.tsx\nimport { useAgent } from "agents-sdk/react"; \n\nconst agent = useAgent({\n agent: "chat",\n});
\n We spent some time thinking about the production story here too: an agent framework that absolves itself of the hard parts — durably persisting state, handling long-running tasks & loops, and horizontal scale — is only going to get you so far. Agents built with agents-sdk
can be deployed directly to Cloudflare and run on top of Durable Objects — which you can think of as stateful micro-servers that can scale to tens of millions — and are able to run wherever they need to. Close to a user for low-latency, close to your data, and/or anywhere in between.
agents-sdk
also exposes:
Integration with React applications via a useAgent
hook that can automatically set up a WebSocket connection between your app and an agent
An AIChatAgent
extension that makes it easier to build intelligent chat agents
State management APIs via this.setState
as well as a native sql
API for writing and querying data within each Agent
State synchronization between frontend applications and the agent state
Agent routing, enabling agent-per-user or agent-per-workflow use-cases. Spawn millions (or tens of millions) of agents without having to think about how to make the infrastructure work, provision CPU, or scale out storage.
Over the coming weeks, expect to see even more here: tighter integration with email APIs to enable more human-in-the-loop use-cases, hooks into WebRTC for voice & video interactivity, a built-in evaluation (evals) framework, and the ability to self-host agents on your own infrastructure.
We’re aiming high here: we think this is just the beginning of what agents are capable of, and we think we can make Workers the best place (but not the only place) to build & run them.
\nWhen users express needs conversationally, tool calling converts these requests into structured formats like JSON that APIs can understand and process, allowing the AI to interact with databases, services, and external systems. This is essential for building agents, as it allows users to express complex intentions in natural language, and AI to decompose these requests, call appropriate tools, evaluate responses and deliver meaningful outcomes.
When using tool calling or building AI agents, the text generation model must respond with valid JSON objects rather than natural language. Today, we're adding JSON mode support to Workers AI, enabling applications to request a structured output response when interacting with AI models. Here's a request to @cf/meta/llama-3.1-8b-instruct-fp8-fast
using JSON mode:
{\n "messages": [\n {\n "role": "system",\n "content": "Extract data about a country."\n },\n {\n "role": "user",\n "content": "Tell me about India."\n }\n ],\n "response_format": {\n "type": "json_schema",\n "json_schema": {\n "type": "object",\n "properties": {\n "name": {\n "type": "string"\n },\n "capital": {\n "type": "string"\n },\n "languages": {\n "type": "array",\n "items": {\n "type": "string"\n }\n }\n },\n "required": [\n "name",\n "capital",\n "languages"\n ]\n }\n }\n}
\n And here’s how the model will respond:
\n{\n "response": {\n "name": "India",\n "capital": "New Delhi",\n "languages": [\n "Hindi",\n "English",\n "Bengali",\n "Telugu",\n "Marathi",\n "Tamil",\n "Gujarati",\n "Urdu",\n "Kannada",\n "Odia",\n "Malayalam",\n "Punjabi",\n "Sanskrit"\n ]\n }\n}
\n As you can see, the model is complying with the JSON schema definition in the request and responding with a validated JSON object. JSON mode is compatible with OpenAI’s response_format
implementation:
response_format: {\n title: "JSON Mode",\n type: "object",\n properties: {\n type: {\n type: "string",\n enum: ["json_object", "json_schema"],\n },\n json_schema: {},\n }\n}
\n This is the list of models that now support JSON mode:
We will continue extending this list to keep up with new, and requested models.
Lastly, we are changing how we restrict the size of AI requests to text generation models, moving from byte-counts to token-counts, introducing the concept of context window and raising the limits of the models in our catalog.
In generative AI, the context window is the sum of the number of input, reasoning, and completion or response tokens a model supports. You can now find the context window limit on each model page in our developer documentation and decide which suits your requirements and use case.
JSON mode is also the perfect companion when using function calling. You can use structured JSON outputs with traditional function calling or the Vercel AI SDK via the workers-ai-provider
.
One of the most common ways to build with AI tooling today is by using the popular AI SDK. Cloudflare’s provider for the AI SDK makes it easy to use Workers AI the same way you would call any other LLM, directly from your code.
In the most recent version, we’ve shipped the following improvements:
Tool calling enabled for generateText
Streaming now works out of the box
Usage statistics are now enabled
You can now use AI Gateway, even when streaming
A key part of building agents is using LLMs for routing, and making decisions on which tools to call next, and summarizing structured and unstructured data. All of these things need to happen quickly, as they are on the critical path of the user-facing experience.
Workers AI, with its globally distributed fleet of GPUs, is a perfect fit for smaller, low-latency LLMs, so we’re excited to make it easy to use with tools developers are already familiar with.
\nSince launching Workers in 2017, we’ve been building a platform to allow developers to build applications that are fast, scalable, and cost-efficient from day one. We took a fundamentally different approach from the way code was previously run on servers, making a bet about what the future of applications was going to look like — isolates running on a global network, in a way that was truly serverless. No regions, no concurrency management, no managing or scaling infrastructure.
The release of Workers was just the beginning, and we continued shipping primitives to extend what developers could build. Some more familiar, like a key-value store (Workers KV), and some that we thought would play a role in enabling net new use cases like Durable Objects. While we didn’t quite predict AI agents (though “Agents” was one of the proposed names for Durable Objects), we inadvertently created the perfect platform for building them.
What do we mean by that?
\nTo be able to run agents efficiently, you need a system that can seamlessly scale up and down to support the constant stop, go, wait patterns. Agents are basically long-running tasks, sometimes waiting on slow reasoning LLMs and external tools to execute. With Cloudflare, you don’t have to pay for long-running processes when your code is not executing. Cloudflare Workers is designed to scale down and only charge you for CPU time, as opposed to wall-clock time.
In many cases, especially when calling LLMs, the difference can be in orders of magnitude — e.g. 2–3 milliseconds of CPU vs. 10 seconds of wall-clock time. When building on Workers, we pass that difference on to you as cost savings.
\nWe took a similar serverless approach when it comes to inference itself. When you need to call an AI model, you need it to be instantaneously available. While the foundation model providers offer APIs that make it possible to just call the LLM, if you’re running open-source models, LoRAs, or self-trained models, most cloud providers today require you to pre-provision resources for what your peak traffic will look like. This means that the rest of the time, you’re still paying for GPUs to sit there idle. With Workers AI, you can pay only when you’re calling our inference APIs, as opposed to unused infrastructure. In fact, you don’t have to think about infrastructure at all, which is the principle at the core of everything we do.
\nDurable Objects and Workflows provide a robust programming model that ensures guaranteed execution for asynchronous tasks that require persistence and reliability. This makes them ideal for handling complex operations like long-running deep thinking LLM calls, human-in-the-loop approval processes, or interactions with unreliable third-party APIs. By maintaining state across requests and automatically handling retries, these tools create a resilient foundation for building sophisticated AI agents that can perform complex, multistep tasks without losing context or progress, even when operations take significant time to complete.
\nDid you catch all of that?
No worries if not: we’ve updated our agents documentation to include everything we talked about above, from breaking down the basics of agents, to showing you how to tackle foundational examples of building with agents.
We’ve also updated our Workers prompt with knowledge of the agents-sdk library, so you can use Cursor, Windsurf, Zed, ChatGPT or Claude to help you build AI Agents and deploy them to Cloudflare.
\nWe’re just getting started, and we love to see all that you build. Please join our Discord, ask questions, and tell us what you’re building.
"],"published_at":[0,"2025-02-25T14:00+00:00"],"updated_at":[0,"2025-02-25T14:53:13.966Z"],"feature_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/57zjl5TyJLzl0LDsZ8aKZc/5619952f43a307d29b849fa938bfa046/image4.png"],"tags":[1,[[0,{"id":[0,"6Foe3R8of95cWVnQwe5Toi"],"name":[0,"AI"],"slug":[0,"ai"]}],[0,{"id":[0,"6hbkItfupogJP3aRDAq6v8"],"name":[0,"Cloudflare Workers"],"slug":[0,"workers"]}],[0,{"id":[0,"5v2UZdTRX1Rw9akmhexnxs"],"name":[0,"Durable Objects"],"slug":[0,"durable-objects"]}]]],"relatedTags":[0],"authors":[1,[[0,{"name":[0,"Rita Kozlov"],"slug":[0,"rita"],"bio":[0,null],"profile_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/56u5zfWi9255rUocOgksl0/5eb2eb16e26893d259f7b00af85e8417/rita.png"],"location":[0,null],"website":[0,null],"twitter":[0,"@ritakozlov_"],"facebook":[0,null]}],[0,{"name":[0,"Sunil Pai"],"slug":[0,"sunil"],"bio":[0,"JavaScript and Les Pauls. Worked at Cloudflare once, left and created PartyKit, came back wiser."],"profile_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2xnINigwFaBtTwOffppE85/596c43dfef6205e9c5b71c2caff4bea9/Sunil_Pai.png"],"location":[0,"London"],"website":[0,null],"twitter":[0,"@threepointone"],"facebook":[0,null]}],[0,{"name":[0,"Matt Silverlock"],"slug":[0,"silverlock"],"bio":[0,"Director of Product at Cloudflare."],"profile_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7xP5qePZD9eyVtwIesXYxh/e714aaa573161ec9eb48d59bd1aa6225/silverlock.jpeg"],"location":[0,null],"website":[0,null],"twitter":[0,"@elithrar"],"facebook":[0,null]}]]],"meta_description":[0,"Today we’re excited to share a few announcements on how we’re making it even easier to build AI agents on Cloudflare, including a new agents-sdk, a new JavaScript agents framework, and updates to Workers AI. "],"primary_author":[0,{}],"localeList":[0,{"name":[0,"blog-english-only"],"enUS":[0,"English for Locale"],"zhCN":[0,"No Page for Locale"],"zhHansCN":[0,"No Page for Locale"],"zhTW":[0,"No Page for Locale"],"frFR":[0,"No Page for Locale"],"deDE":[0,"No Page for Locale"],"itIT":[0,"No Page for Locale"],"jaJP":[0,"No Page for Locale"],"koKR":[0,"No Page for Locale"],"ptBR":[0,"No Page for Locale"],"esLA":[0,"No Page for Locale"],"esES":[0,"No Page for Locale"],"enAU":[0,"No Page for Locale"],"enCA":[0,"No Page for Locale"],"enIN":[0,"No Page for Locale"],"enGB":[0,"No Page for Locale"],"idID":[0,"No Page for Locale"],"ruRU":[0,"No Page for Locale"],"svSE":[0,"No Page for Locale"],"viVN":[0,"No Page for Locale"],"plPL":[0,"No Page for Locale"],"arAR":[0,"No Page for Locale"],"nlNL":[0,"No Page for Locale"],"thTH":[0,"No Page for Locale"],"trTR":[0,"No Page for Locale"],"heIL":[0,"No Page for Locale"],"lvLV":[0,"No Page for Locale"],"etEE":[0,"No Page for Locale"],"ltLT":[0,"No Page for Locale"]}],"url":[0,"https://blog.cloudflare.com/build-ai-agents-on-cloudflare"],"metadata":[0,{"title":[0,"Making Cloudflare the best platform for building AI Agents"],"description":[0,"Today we’re excited to share a few announcements on how we’re making it even easier to build AI agents on Cloudflare, including a new agents-sdk, a new JavaScript agents framework, and updates to Workers AI. "],"imgPreview":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7qCHaYOJHWsfpBxtkrnTyC/2fb575f8d393d894023f8f14be4ef5c4/Making_Cloudflare_the_best_platform_for_building_AI_Agents-OG.png"]}]}],"translations":[0,{"posts.by":[0,"By"],"footer.gdpr":[0,"GDPR"],"lang_blurb1":[0,"This post is also available in {lang1}."],"lang_blurb2":[0,"This post is also available in {lang1} and {lang2}."],"lang_blurb3":[0,"This post is also available in {lang1}, {lang2} and {lang3}."],"footer.press":[0,"Press"],"header.title":[0,"The Cloudflare Blog"],"search.clear":[0,"Clear"],"search.filter":[0,"Filter"],"search.source":[0,"Source"],"footer.careers":[0,"Careers"],"footer.company":[0,"Company"],"footer.support":[0,"Support"],"footer.the_net":[0,"theNet"],"search.filters":[0,"Filters"],"footer.our_team":[0,"Our team"],"footer.webinars":[0,"Webinars"],"page.more_posts":[0,"More posts"],"posts.time_read":[0,"{time} min read"],"search.language":[0,"Language"],"footer.community":[0,"Community"],"footer.resources":[0,"Resources"],"footer.solutions":[0,"Solutions"],"footer.trademark":[0,"Trademark"],"header.subscribe":[0,"Subscribe"],"footer.compliance":[0,"Compliance"],"footer.free_plans":[0,"Free plans"],"footer.impact_ESG":[0,"Impact/ESG"],"posts.follow_on_X":[0,"Follow on X"],"footer.help_center":[0,"Help center"],"footer.network_map":[0,"Network Map"],"header.please_wait":[0,"Please Wait"],"page.related_posts":[0,"Related posts"],"search.result_stat":[0,"Results {search_range} of {search_total} for {search_keyword}"],"footer.case_studies":[0,"Case Studies"],"footer.connect_2024":[0,"Connect 2024"],"footer.terms_of_use":[0,"Terms of Use"],"footer.white_papers":[0,"White Papers"],"footer.cloudflare_tv":[0,"Cloudflare TV"],"footer.community_hub":[0,"Community Hub"],"footer.compare_plans":[0,"Compare plans"],"footer.contact_sales":[0,"Contact Sales"],"header.contact_sales":[0,"Contact Sales"],"header.email_address":[0,"Email Address"],"page.error.not_found":[0,"Page not found"],"footer.developer_docs":[0,"Developer docs"],"footer.privacy_policy":[0,"Privacy Policy"],"footer.request_a_demo":[0,"Request a demo"],"page.continue_reading":[0,"Continue reading"],"footer.analysts_report":[0,"Analyst reports"],"footer.for_enterprises":[0,"For enterprises"],"footer.getting_started":[0,"Getting Started"],"footer.learning_center":[0,"Learning Center"],"footer.project_galileo":[0,"Project Galileo"],"pagination.newer_posts":[0,"Newer Posts"],"pagination.older_posts":[0,"Older Posts"],"posts.social_buttons.x":[0,"Discuss on X"],"search.icon_aria_label":[0,"Search"],"search.source_location":[0,"Source/Location"],"footer.about_cloudflare":[0,"About Cloudflare"],"footer.athenian_project":[0,"Athenian Project"],"footer.become_a_partner":[0,"Become a partner"],"footer.cloudflare_radar":[0,"Cloudflare Radar"],"footer.network_services":[0,"Network services"],"footer.trust_and_safety":[0,"Trust & Safety"],"header.get_started_free":[0,"Get Started Free"],"page.search.placeholder":[0,"Search Cloudflare"],"footer.cloudflare_status":[0,"Cloudflare Status"],"footer.cookie_preference":[0,"Cookie Preferences"],"header.valid_email_error":[0,"Must be valid email."],"search.result_stat_empty":[0,"Results {search_range} of {search_total}"],"footer.connectivity_cloud":[0,"Connectivity cloud"],"footer.developer_services":[0,"Developer services"],"footer.investor_relations":[0,"Investor relations"],"page.not_found.error_code":[0,"Error Code: 404"],"search.autocomplete_title":[0,"Insert a query. Press enter to send"],"footer.logos_and_press_kit":[0,"Logos & press kit"],"footer.application_services":[0,"Application services"],"footer.get_a_recommendation":[0,"Get a recommendation"],"posts.social_buttons.reddit":[0,"Discuss on Reddit"],"footer.sse_and_sase_services":[0,"SSE and SASE services"],"page.not_found.outdated_link":[0,"You may have used an outdated link, or you may have typed the address incorrectly."],"footer.report_security_issues":[0,"Report Security Issues"],"page.error.error_message_page":[0,"Sorry, we can't find the page you are looking for."],"header.subscribe_notifications":[0,"Subscribe to receive notifications of new posts:"],"footer.cloudflare_for_campaigns":[0,"Cloudflare for Campaigns"],"header.subscription_confimation":[0,"Subscription confirmed. Thank you for subscribing!"],"posts.social_buttons.hackernews":[0,"Discuss on Hacker News"],"footer.diversity_equity_inclusion":[0,"Diversity, equity & inclusion"],"footer.critical_infrastructure_defense_project":[0,"Critical Infrastructure Defense Project"]}]}" ssr="" client="load" opts="{"name":"PostCard","value":true}" await-children="">2025-02-25
Today we’re excited to share a few announcements on how we’re making it even easier to build AI agents on Cloudflare....
2024-04-05
We're thrilled to announce that PartyKit, a trailblazer in enabling developers to craft ambitious real-time, collaborative, multiplayer applications, is now a part of Cloudflare...
2022-05-09
We are proud to announce that Wrangler goes public today for general usage, and can’t wait to see what people build with it...
2021-11-16
We're excited to announce the second-generation of our developer tooling for Cloudflare Workers. It’s a new developer experience that’s out-of-the-box, lightning fast, and can even run Workers on a local machine. (Yes!)...