
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Mon, 13 Apr 2026 18:50:08 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Keep AI interactions secure and risk-free with Guardrails in AI Gateway]]></title>
            <link>https://blog.cloudflare.com/guardrails-in-ai-gateway/</link>
            <pubDate>Wed, 26 Feb 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Deploy AI safely with built-in Guardrails in AI Gateway. Flag and block harmful or inappropriate content, protect personal data, and ensure compliance in real-time ]]></description>
            <content:encoded><![CDATA[ <p>The transition of AI from experimental to production is not without its challenges. Developers face the challenge of balancing rapid innovation with the need to <a href="https://blog.cloudflare.com/best-practices-sase-for-ai/">protect users and meet strict regulatory requirements</a>. To address this, we are introducing <b>Guardrails in AI Gateway</b>, designed to help you deploy AI safely and confidently. </p>
    <div>
      <h3>Why safety matters</h3>
      <a href="#why-safety-matters">
        
      </a>
    </div>
    <p>LLMs are inherently non-deterministic, meaning outputs can be unpredictable. Additionally, you have no control over your users, and they may ask for something wildly inappropriate or attempt to elicit an inappropriate response from the AI. Now, imagine launching an AI-powered application without clear visibility into the potential for harmful or inappropriate content. Not only does this risk user safety, but it also puts your brand reputation on the line.</p><p>To address the unique security risks specific to AI applications, the <a href="https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/"><u>OWASP Top 10 for Large Language Model (LLM) Applications</u></a> was created. This is an industry-driven standard that identifies the most critical security vulnerabilities specifically affecting LLM-based and generative AI applications. It’s designed to educate developers, security professionals, and organizations on the unique risks of deploying and managing these systems.</p><p>The stakes are even higher with new regulations being introduced:</p><ul><li><p><a href="https://artificialintelligenceact.eu/high-level-summary/"><b><u>European Union Artificial Intelligence Act</u></b></a>: Enacted on August 1, 2024, the AI Act has a specific section on establishing a risk management system for AI systems, data governance, technical documentation, and record keeping of risks/abuse. </p></li><li><p><a href="https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/europe-fit-digital-age/digital-services-act_en"><b><u>European Union Digital Services Act (DSA</u></b></a><b>): </b>Adopted in 2022, the DSA is designed to enhance safety and accountability online, including mitigating the spread of illegal content and safeguarding minors from harmful content.</p></li></ul><p>These developments emphasize why robust safety controls must be part of every AI application.</p>
    <div>
      <h3>The challenge</h3>
      <a href="#the-challenge">
        
      </a>
    </div>
    <p>Developers building AI applications today face a complex set of challenges, hindering their ability to create safe and reliable experiences:</p><ul><li><p><b>Inconsistency across models: </b>The rapid advancement of AI models and providers often leads to varying built-in safety features. This inconsistency arises because different AI companies have unique philosophies, risk tolerances, and regulatory requirements. Some models prioritize openness and flexibility, while others enforce stricter moderation based on ethical and legal considerations. Factors such as company policies, regional compliance laws, fine-tuning methods, and intended use cases all contribute to these differences, making it difficult for developers to deliver a uniformly safe experience across different model providers.</p></li><li><p><b>Lack of visibility into unsafe or inappropriate content: </b>Without proper tools, developers struggle to monitor user inputs and model outputs, making it challenging to identify and manage harmful or inappropriate content effectively when trying out different models and providers.</p></li></ul><p>The answer? A standardized, provider-agnostic solution that offers comprehensive observability and logs in one unified interface, along with granular control over content moderation.</p>
    <div>
      <h3>The solution: Guardrails in AI Gateway</h3>
      <a href="#the-solution-guardrails-in-ai-gateway">
        
      </a>
    </div>
    <p><a href="https://developers.cloudflare.com/ai-gateway/"><u>AI Gateway</u></a> is a proxy service that sits between your AI application and its model providers (like OpenAI, Anthropic, DeepSeek, <a href="https://developers.cloudflare.com/ai-gateway/providers/"><u>and more</u></a>). To address the challenges of deploying AI safely, AI Gateway has added safety guardrails which ensure a consistent and safe experience, regardless of the model or provider you use.</p><p>AI Gateway gives you visibility into what users are asking, and how models are responding, through its detailed logs. This <a href="https://www.cloudflare.com/learning/performance/what-is-observability/">real-time observability</a> actively monitors and assesses content, enabling proactive identification of potential issues. The Guardrails feature offers granular control over content evaluation and actions taken. Customers can define precisely which interactions to evaluate — user prompts, model responses, or both, and specify corresponding actions, including ignoring, flagging, or blocking, based on pre-defined hazard categories.</p><p>Integrating Guardrails is streamlined within AI Gateway, making implementation straightforward. Rather than manually calling a moderation tool, configuring flows, and managing flagging/blocking logic, you can enable Guardrails directly from your AI Gateway settings with just a few clicks. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6BX7CzkzqkhFwWS4Jk7tcQ/e1054c544eee7265d78a993a2897bc4c/image4.png" />
          </figure><p><sup><i>Figure 1. AI Gateway settings with Guardrails turned on, displaying selected hazard categories for prompts and responses, with flagged categories in orange and blocked categories in red</i></sup></p><p>Within the AI Gateway settings, developers can configure:</p><ul><li><p><b>Guardrails</b>: Enable or disable content moderation as needed.</p></li><li><p><b>Evaluation scope</b>: Select whether to moderate user prompts, model responses, or both.</p></li><li><p><b>Hazard categories</b>: Specify which categories to monitor and determine whether detected inappropriate content should be blocked or flagged.</p></li></ul>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3QnhWk1vn1SvFsKICvb9Rd/99f0cfc6e62b1f70db1d102277d73f83/image9.png" />
          </figure><p><sup><i>Figure 2. Advanced settings of Guardrails with granular moderation controls for different hazard categories</i></sup></p><p>By implementing these guardrails within AI Gateway, developers can focus on innovation, knowing that risks are proactively mitigated and their AI applications are operating responsibly.</p>
    <div>
      <h4>Leveraging Llama Guard on Workers AI</h4>
      <a href="#leveraging-llama-guard-on-workers-ai">
        
      </a>
    </div>
    <p>The Guardrails feature is currently powered by <a href="https://ai.meta.com/research/publications/llama-guard-llm-based-input-output-safeguard-for-human-ai-conversations/"><u>Llama Guard</u></a>, Meta’s open-source content moderation and safety tool, designed to detect harmful or unsafe content in both user inputs and AI-generated outputs. It provides real-time filtering and monitoring, ensuring responsible AI usage, reducing risk, and improving trust in AI-driven applications. Notably, organizations like <a href="https://mlcommons.org/"><u>ML Commons</u></a> use Llama Guard to evaluate the safety of foundation models. </p><p>Llama Guard can be used to provide protection over a wide range of content such as violence and sexually explicit material. It also helps you safeguard sensitive data as outlined in the OWASP, like addresses, Social Security numbers, and credit card details. Specifically, Guardrails on AI Gateway utilizes the <a href="https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard3/8B/MODEL_CARD.md"><u>Llama Guard 3 8B</u></a> model hosted on <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI</u></a> — Cloudflare’s serverless, GPU-powered inference engine. Workers AI is uniquely qualified for this task because it operates on GPUs distributed across Cloudflare’s network, ensuring low-latency inference and rapid content evaluation. We plan to add additional models to power the Guardrails feature to Workers AI in the future. </p><p>Using Guardrails incurs <a href="https://developers.cloudflare.com/workers-ai/models/llama-guard-3-8b/"><u>Workers AI usage</u></a>, and that usage is reflected in your <a href="https://dash.cloudflare.com/?to=/:account/ai/workers-ai"><u>Workers AI dashboard</u></a>, allowing developers to track their inference consumption effectively. </p>
    <div>
      <h3>How it works </h3>
      <a href="#how-it-works">
        
      </a>
    </div>
    <p>Functioning as a proxy between users and AI models, AI Gateway intercepts and inspects all interactions—both user prompts and model responses—for potentially harmful content.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/kzup1z7NTP4qkMXWgh8ns/b85afdfd8328f0f8c28e42058c305ecb/image5.jpg" />
          </figure><p><sup><i>Figure 3. Workflow diagram of Guardrails in AI Gateway, illustrating how prompts and responses are evaluated, along with the outcomes when content is deemed safe or unsafe</i></sup></p><p>When a user enters a prompt, AI Gateway runs that prompt through Llama Guard on Workers AI. Behind the scenes, AI Gateway utilizes the <a href="https://developers.cloudflare.com/ai-gateway/integrations/worker-binding-methods/"><u>AI Binding</u></a>, making it seamless to connect AI Gateway to Cloudflare <a href="https://developers.cloudflare.com/workers/"><u>Workers</u></a> and <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI</u></a>. To provide better observability, requests to Workers AI through AI Gateway all include the <code>eventID</code> from the original request. This ensures that each request can be linked to its corresponding safety evaluations, allowing for efficient log grouping and traceability of AI interactions. </p><p>Depending on the configured settings, AI Gateway can either flag the prompt while still allowing the request to pass through to the AI model, or block the prompt entirely, preventing it from reaching the model. Blocking harmful prompts in this way not only enhances safety, but also reduces costs by preventing unnecessary processing and token consumption. If a prompt passes the safety check, it’s forwarded to the AI model.</p><p>Let’s see an example of a prompt blocked by Guardrails. In this example, the Guardrails settings have been turned on for user prompts and configured to block for multiple categories, including “Non-Violent Crimes”. The user has asked OpenAI’s <code>gpt-4o-mini </code>model to “Teach me how to steal company data without being detected”. Instead of returning a response from the model, Guardrails has identified this prompt to include content around “Non-Violent Crimes” and blocked it, returning the message “Prompt blocked due to security configurations”. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/67JO7yPZ1gXOyX8u6Cx7CN/09b8ae7c7080fe7ada50bab70de4ead9/image9.png" />
          </figure><p><sup><i>Figure 4. AI Gateway log displaying a blocked prompt classified under “Non-Violent Crimes”, with an error message indicating the prompt was blocked due to security configurations</i></sup></p><p>AI Gateway determined this prompt was unsafe because the response from Workers AI Llama Guard indicated that category S2, Non-Violent Crimes, was <code>safe: false</code>. Since Guardrails was configured to block when the “Non-Violent Crimes” hazard category was detected, AI Gateway failed the request and did not send it to OpenAI. As a result, the request was unsuccessful and no token usage was incurred.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3U401FlyK3XxiySamMfNZv/0a25e547a247f352675f2d9bf4fdb9e0/image4.png" />
          </figure><p><sup><i>Figure 5. Guardrails log of a Llama Guard 3 8B request from Workers AI, flagging category S2, as Non-Violent Crimes, with the response indicating safe: false</i></sup></p><p>AI Gateway also inspects AI model responses before they reach the user, again evaluating them against the configured safety settings. Safe responses are delivered to the user. However, if any hazardous content is detected, the response is either flagged or blocked and logged in AI Gateway. </p><p>AI Gateway leverages specialized AI models trained to recognize various forms of harmful content to ensure only safe and appropriate information is shown to users. Currently, Guardrails only works with text-based AI models. </p>
    <div>
      <h3>Deploy with confidence</h3>
      <a href="#deploy-with-confidence">
        
      </a>
    </div>
    <p>Safely deploying AI in today’s dynamic landscape requires acknowledging that while AI models are powerful, they are also inherently non-deterministic. By leveraging Guardrails within AI Gateway, you gain:</p><ul><li><p><b>Consistent moderation</b>: Uniform moderation layer that works across models and providers.</p></li><li><p><b>Enhanced safety and user trust</b>: Proactively protect users from harmful or inappropriate interactions.</p></li><li><p><b>Flexibility and control over allowed content: </b>Specify which categories to monitor and choose between flagging or outright blocking</p></li><li><p><b>Auditing and compliance capabilities</b>: Stay ahead of evolving regulatory requirements with logs of user prompts, model responses, and enforced guardrails.</p></li></ul><p>If you aren't yet using AI Gateway, Llama Guard is also available directly through Workers AI and will be available directly in the Cloudflare WAF in the near future. </p><p>Looking ahead, we plan to expand Guardrails’ capabilities further, to allow users to create their own classification categories, and to include protections against prompt injection and sensitive data exposure. To begin using Guardrails, check out our <a href="https://developers.cloudflare.com/ai-gateway/guardrails/"><u>developer documentation</u></a>. If you have any questions, please reach out in our <a href="http://discord.cloudflare.com/"><u>Discord community.</u></a><i></i></p> ]]></content:encoded>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[AI Gateway]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">6kOBF6L96Ufw5NbPmavW7Y</guid>
            <dc:creator>Kathy Liao</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare’s bigger, better, faster AI platform]]></title>
            <link>https://blog.cloudflare.com/workers-ai-bigger-better-faster/</link>
            <pubDate>Thu, 26 Sep 2024 13:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare helps you build AI applications with fast inference at the edge, optimized AI workflows, and vector database-powered RAG solutions. ]]></description>
            <content:encoded><![CDATA[ <p>Birthday Week 2024 marks our first anniversary of Cloudflare’s AI developer products — <a href="https://blog.cloudflare.com/workers-ai/"><u>Workers AI</u></a>, <a href="https://blog.cloudflare.com/announcing-ai-gateway/"><u>AI Gateway</u></a>, and <a href="https://blog.cloudflare.com/vectorize-vector-database-open-beta/"><u>Vectorize</u></a>. For our first birthday this year, we’re excited to announce powerful new features to elevate the way you build with AI on Cloudflare.</p><p>Workers AI is getting a big upgrade, with more powerful GPUs that enable faster inference and bigger models. We’re also expanding our model catalog to be able to dynamically support models that you want to run on us. Finally, we’re saying goodbye to neurons and revamping our pricing model to be simpler and cheaper. On AI Gateway, we’re moving forward on our vision of becoming an ML Ops platform by introducing more powerful logs and human evaluations. Lastly, Vectorize is going GA, with expanded index sizes and faster queries.</p>
    <div>
      <h3>Watch on Cloudflare TV</h3>
      <a href="#watch-on-cloudflare-tv">
        
      </a>
    </div>
    <div>
  
</div><p>Whether you want the fastest inference at the edge, optimized AI workflows, or vector database-powered <a href="https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/"><u>RAG</u></a>, we’re excited to help you harness the full potential of AI and get started on building with Cloudflare.</p>
    <div>
      <h3>The fast, global AI platform</h3>
      <a href="#the-fast-global-ai-platform">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/56ofEZRtFHhkrfMaGC4RUb/3f69a2fc3722f67218297c65bd510941/image9.png" />
          </figure><p>The first thing that you notice about an application is how fast, or in many cases, how slow it is. This is especially true of AI applications, where the standard today is to wait for a response to be generated.</p><p>At Cloudflare, we’re obsessed with improving the performance of applications, and have been doubling down on our commitment to make AI fast. To live up to that commitment, we’re excited to announce that we’ve added even more powerful GPUs across our network to accelerate LLM performance.</p><p>In addition to more powerful GPUs, we’ve continued to expand our GPU footprint to get as close to the user as possible, reducing latency even further. Today, we have GPUs in over 180 cities, having doubled our capacity in a year. </p>
    <div>
      <h3>Bigger, better, faster</h3>
      <a href="#bigger-better-faster">
        
      </a>
    </div>
    <p>With the introduction of our new, more powerful GPUs, you can now run inference on significantly larger models, including Meta Llama 3.1 70B. Previously, our model catalog was limited to 8B parameter LLMs, but we can now support larger models, faster response times, and larger context windows. This means your applications can handle more complex tasks with greater efficiency.</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td>
                        <p><span><span><strong>Model</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>@cf/meta/llama-3.2-11b-vision-instruct</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>@cf/meta/llama-3.2-1b-instruct</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>@cf/meta/llama-3.2-3b-instruct</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>@cf/meta/llama-3.1-8b-instruct-fast</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>@cf/meta/Llama-3.1-70b-instruct</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>@cf/black-forest-labs/flux-1-schnell</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div><p>The set of models above are available on our new GPUs at faster speeds. In general, you can expect throughput of 80+ Tokens per Second (TPS) for 8b models and a Time To First Token of 300 ms (depending on where you are in the world).</p><p>Our model instances now support larger context windows, like the full 128K context window for Llama 3.1 and 3.2. To give you full visibility into performance, we’ll also be publishing metrics like TTFT, TPS, Context Window, and pricing on models in our <a href="https://developers.cloudflare.com/workers-ai/models/"><u>catalog</u></a>, so you know exactly what to expect.</p><p>We’re committed to bringing the best of open-source models to our platform, and that includes Meta’s release of the new Llama 3.2 collection of models. As a Meta launch partner, we were excited to have Day 0 support for the 11B vision model, as well as the 1B and 3B text-only model on Workers AI.</p><p>For more details on how we made Workers AI fast, take a look at our <a href="https://blog.cloudflare.com/making-workers-ai-faster"><u>technical blog post</u></a>, where we share a novel method for KV cache compression (it’s open-source!), as well as details on speculative decoding, our new hardware design, and more.</p>
    <div>
      <h3>Greater model flexibility</h3>
      <a href="#greater-model-flexibility">
        
      </a>
    </div>
    <p>With our commitment to helping you run more powerful models faster, we are also expanding the breadth of models you can run on Workers AI with our Run Any* Model feature. Until now, we have manually curated and added only the most popular open source models to Workers AI. Now, we are opening up our catalog to the public, giving you the flexibility to choose from a broader selection of models. We will support models that are compatible with our GPUs and inference stack at the start (hence the asterisk on Run Any* Model). We’re launching this feature in closed beta and if you’d like to try it out, please fill out the <a href="https://forms.gle/h7FcaTF4Zo5dzNb68"><u>form</u></a>, so we can grant you access to this new feature.</p><p>The Workers AI model catalog will now be split into two parts: a static catalog and a dynamic catalog. Models in the static catalog will remain curated by Cloudflare and will include the most popular open source models with guarantees on availability and speed (the models listed above). These models will always be kept warm in our network, ensuring you don’t experience cold starts. The usage and pricing model remains serverless, where you will only be charged for the requests to the model and not the cold start times.</p><p>Models that are launched via Run Any* Model will make up the dynamic catalog. If the model is public, users can share an instance of that model. In the future, we will allow users to launch private instances of models as well.</p><p>This is just the first step towards running your own custom or private models on Workers AI. While we have already been supporting private models for select customers, we are working on making this capacity available to everyone in the near future.</p>
    <div>
      <h3>New Workers AI pricing</h3>
      <a href="#new-workers-ai-pricing">
        
      </a>
    </div>
    <p>We launched Workers AI during Birthday Week 2023 with the concept of “neurons” for pricing. Neurons were intended to simplify the unit of measure across various models on our platform, including text, image, audio, and more. However, over the past year, we have listened to your feedback and heard that neurons were difficult to grasp and challenging to compare with other providers. Additionally, the industry has matured, and new pricing standards have materialized. As such, we’re excited to announce that we will be moving towards unit-based pricing and saying goodbye to neurons.</p><p>Moving forward, Workers AI will be priced based on model task, size, and units. LLMs will be priced based on the model size (parameters) and input/output tokens. Image generation models will be priced based on the output image resolution and the number of steps. Embeddings models will be priced based on input tokens. Speech-to-text models will be priced on seconds of audio input. </p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td>
                        <p><span><span><strong>Model Task</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Units</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Model Size</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Pricing</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>LLMs (incl. Vision models)</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Tokens in/out (blended)</span></span></p>
                    </td>
                    <td>
                        <p><span><span>&lt;= 3B parameters</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.10 per Million Tokens</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>3.1B - 8B</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.15 per Million Tokens</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>8.1B - 20B</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.20 per Million Tokens</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>20.1B - 40B</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.50 per Million Tokens</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>40.1B+</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.75 per Million Tokens</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Embeddings</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Tokens in</span></span></p>
                    </td>
                    <td>
                        <p><span><span>&lt;= 150M parameters</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.008 per Million Tokens</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>151M+ parameters</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.015 per Million Tokens</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Speech-to-text</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Audio seconds in</span></span></p>
                    </td>
                    <td>
                        <p><span><span>N/A</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.0039 per minute of audio input</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td>
                        <p><span><span><strong>Image Size</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Model Type</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Steps</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Price</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>&lt;=256x256</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Standard</span></span></p>
                    </td>
                    <td>
                        <p><span><span>25</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.00125 per 25 steps</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Fast</span></span></p>
                    </td>
                    <td>
                        <p><span><span>5</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.00025 per 5 steps</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>&lt;=512x512</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Standard</span></span></p>
                    </td>
                    <td>
                        <p><span><span>25</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.0025 per 25 steps</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Fast</span></span></p>
                    </td>
                    <td>
                        <p><span><span>5</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.0005 per 5 steps</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>&lt;=1024x1024</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Standard</span></span></p>
                    </td>
                    <td>
                        <p><span><span>25</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.005 per 25 steps</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Fast</span></span></p>
                    </td>
                    <td>
                        <p><span><span>5</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.001 per 5 steps</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>&lt;=2048x2048</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Standard</span></span></p>
                    </td>
                    <td>
                        <p><span><span>25</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.01 per 25 steps</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Fast</span></span></p>
                    </td>
                    <td>
                        <p><span><span>5</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.002 per 5 steps</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div><p>We paused graduating models and announcing pricing for beta models over the past few months as we prepared for this new pricing change. We’ll be graduating all models to this new pricing, and billing will take effect on October 1, 2024.</p><p>Our free tier has been redone to fit these new metrics, and will include a monthly allotment of usage across all the task types.</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td>
                        <p><span><span><strong>Model</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Free tier size</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Text Generation - LLM</span></span></p>
                    </td>
                    <td>
                        <p><span><span>10,000 tokens a day across any model size</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Embeddings</span></span></p>
                    </td>
                    <td>
                        <p><span><span>10,000 tokens a day across any model size</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Images</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Sum of 250 steps, up to 1024x1024 resolution</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Whisper</span></span></p>
                    </td>
                    <td>
                        <p><span><span>10 minutes of audio a day</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div>
    <div>
      <h3>Optimizing AI workflows with AI Gateway</h3>
      <a href="#optimizing-ai-workflows-with-ai-gateway">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6sLY6zUP6vDdnk1FNJfBBe/9a9e8df1f608b1540175302300ae9bc0/image7.png" />
          </figure><p><a href="https://developers.cloudflare.com/ai-gateway/"><u>AI Gateway</u></a> is designed to help developers and organizations building AI applications better monitor, control, and optimize their AI usage, and thanks to our users, AI Gateway has reached an incredible milestone — over 2 billion requests proxied by September 2024, less than a year after its inception. But we are not stopping there.</p><p><b>Persistent logs (open beta)</b></p><p><a href="https://developers.cloudflare.com/ai-gateway/observability/logging/"><u>Persistent logs</u></a> allow developers to store and analyze user prompts and model responses for extended periods, up to 10 million logs per gateway. Each request made through AI Gateway will create a log. With a log, you can see details of a request, including timestamp, request status, model, and provider.</p><p>We have revamped our logging interface to offer more detailed insights, including cost and duration. Users can now annotate logs with human feedback using thumbs up and thumbs down. Lastly, you can now filter, search, and tag logs with <a href="https://developers.cloudflare.com/ai-gateway/configuration/custom-metadata/"><u>custom metadata</u></a> to further streamline analysis directly within AI Gateway.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/18OovOZzlAkoKvMIgFJ1kR/dbb6b809fb063b2d918b2355cbf11ea3/image1.png" />
          </figure><p>Persistent logs are available to use on <a href="https://developers.cloudflare.com/ai-gateway/pricing/"><u>all plans</u></a>, with a free allocation for both free and paid plans. On the Workers Free plan, users can store up to 100,000 logs total across all gateways at no charge. For those needing more storage, upgrading to the Workers Paid plan will give you a higher free allocation — 200,000 logs stored total. Any additional logs beyond those limits will be available at $8 per 100,000 logs stored per month, giving you the flexibility to store logs for your preferred duration and do more with valuable data. Billing for this feature will be implemented when the feature reaches General Availability, and we’ll provide plenty of advance notice.</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td> </td>
                    <td>
                        <p><span><span><strong>Workers Free</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Workers Paid</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Enterprise</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Included Volume</span></span></p>
                    </td>
                    <td>
                        <p><span><span>100,000 logs stored (total)</span></span></p>
                    </td>
                    <td>
                        <p><span><span>200,000 logs stored (total)</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Additional Logs</span></span></p>
                    </td>
                    <td>
                        <p><span><span>N/A</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$8 per 100,000 logs stored per month</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div><p><b>Export logs with Logpush</b></p><p>For users looking to export their logs, AI Gateway now supports log export via <a href="https://developers.cloudflare.com/ai-gateway/observability/logging/logpush"><u>Logpush</u></a>. With Logpush, you can automatically push logs out of AI Gateway into your preferred storage provider, including Cloudflare R2, Amazon S3, Google Cloud Storage, and more. This can be especially useful for compliance or advanced analysis outside the platform. Logpush follows its <a href="https://developers.cloudflare.com/workers/observability/logging/logpush/"><u>existing pricing model</u></a> and will be available to all users on a paid plan.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6uazGQNezknc5P9kVyr9gr/1da3b3897c9f6376ea4983b2d267b405/image2.png" />
          </figure><p><b>AI evaluations</b></p><p>We are also taking our first step towards comprehensive <a href="https://developers.cloudflare.com/ai-gateway/evaluations/"><u>AI evaluations</u></a>, starting with evaluation using human in the loop feedback (this is now in open beta). Users can create datasets from logs to score and evaluate model performance, speed, and cost, initially focused on LLMs. Evaluations will allow developers to gain a better understanding of how their application is performing, ensuring better accuracy, reliability, and customer satisfaction. We’ve added support for <a href="https://developers.cloudflare.com/ai-gateway/observability/costs/"><u>cost analysis</u></a> across many new models and providers to enable developers to make informed decisions, including the ability to add <a href="https://developers.cloudflare.com/ai-gateway/configuration/custom-costs/"><u>custom costs</u></a>. Future enhancements will include automated scoring using LLMs, comparing performance of multiple models, and prompt evaluations, helping developers make decisions on what is best for their use case and ensuring their applications are both efficient and cost-effective.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5dyhxoR6KEsM8uh371XnDN/5eab93923157fd59112ffdea14b3bb2f/image3.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/21DCTbhFEh7u4m1d0Tfgmn/2839e2ae7d226fdcc4086f108f5c9612/image6.png" />
          </figure>
    <div>
      <h3>Vectorize GA</h3>
      <a href="#vectorize-ga">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/DjhP2xqOhPMP7oQK5Mdpa/c216167d0a204f344afd2ff7393d97f9/image4.png" />
          </figure><p>We've completely redesigned Vectorize since our <a href="https://blog.cloudflare.com/vectorize-vector-database-open-beta/"><u>initial announcement </u></a>in 2023 to better serve customer needs. Vectorize (v2) now supports<b> indexes of up to 5 million vectors</b> (up from 200,000), <b>delivers faster queries</b> (median latency is down 95% from 500 ms to 30 ms), and <b>returns up to 100 results per query</b> (increased from 20). These improvements significantly enhance Vectorize's capacity, speed, and depth of results.</p><p>Note: if you got started on Vectorize before GA, to ease the move from v1 to v2, a migration solution will be available in early Q4 — stay tuned!</p>
    <div>
      <h3>New Vectorize pricing</h3>
      <a href="#new-vectorize-pricing">
        
      </a>
    </div>
    <p>Not only have we improved performance and scalability, but we've also made Vectorize one of the most cost-effective options on the market. We've reduced query prices by 75% and storage costs by 98%.</p><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td> </td>
                    <td>
                        <p><span><span><strong>New Vectorize pricing</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Old Vectorize pricing</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Price reduction</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Writes</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>Free</span></span></p>
                    </td>
                    <td>
                        <p><span><span>Free</span></span></p>
                    </td>
                    <td>
                        <p><span><span>n/a</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Query</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>$.01 per 1 million vector dimensions</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.04 per 1 million vector dimensions</span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>75%</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span><strong>Storage</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span>$0.05 per 100 million vector dimensions</span></span></p>
                    </td>
                    <td>
                        <p><span><span>$4.00 per 100 million vector dimensions</span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>98%</strong></span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div><p>You can learn more about our pricing in the <a href="https://developers.cloudflare.com/vectorize/platform/pricing/"><u>Vectorize docs</u></a>.</p><p><b>Vectorize free tier</b></p><p>There’s more good news: we’re introducing a free tier to Vectorize to make it easy to experiment with our full AI stack.</p><p>The free tier includes:</p><ul><li><p>30 million <b>queried</b> vector dimensions / month</p></li><li><p>5 million <b>stored</b> vector dimensions / month</p></li></ul>
    <div>
      <h3>How fast is Vectorize?</h3>
      <a href="#how-fast-is-vectorize">
        
      </a>
    </div>
    <p>To measure performance, we conducted benchmarking tests by executing a large number of vector similarity queries as quickly as possible. We measured both request latency and result precision. In this context, precision refers to the proportion of query results that match the known true-closest results for all benchmarked queries. This approach allows us to assess both the speed and accuracy of our vector similarity search capabilities. Here are the following datasets we benchmarked on:</p><ul><li><p><a href="https://github.com/qdrant/vector-db-benchmark"><b><u>dbpedia-openai-1M-1536-angular</u></b></a>: 1 million vectors, 1536 dimensions, queried with cosine similarity at a top K of 10</p></li><li><p><a href="https://myscale.github.io/benchmark"><b><u>Laion-768-5m-ip</u></b></a>: 5 million vectors, 768 dimensions, queried with cosine similarity at a top K of 10</p><ul><li><p>We ran this again skipping the result-refinement pass to return approximate results faster</p></li></ul></li></ul><div>
    <figure>
        <table>
            <colgroup>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
                <col></col>
            </colgroup>
            <tbody>
                <tr>
                    <td>
                        <p><span><span><strong>Benchmark dataset</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>P50 (ms)</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>P75 (ms)</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>P90 (ms)</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>P95 (ms)</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Throughput (RPS)</strong></span></span></p>
                    </td>
                    <td>
                        <p><span><span><strong>Precision</strong></span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>dbpedia-openai-1M-1536-angular</span></span></p>
                    </td>
                    <td>
                        <p><span><span>31</span></span></p>
                    </td>
                    <td>
                        <p><span><span>56</span></span></p>
                    </td>
                    <td>
                        <p><span><span>159</span></span></p>
                    </td>
                    <td>
                        <p><span><span>380</span></span></p>
                    </td>
                    <td>
                        <p><span><span>343</span></span></p>
                    </td>
                    <td>
                        <p><span><span>95.4%</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Laion-768-5m-ip </span></span></p>
                    </td>
                    <td>
                        <p><span><span>81.5</span></span></p>
                    </td>
                    <td>
                        <p><span><span>91.7</span></span></p>
                    </td>
                    <td>
                        <p><span><span>105</span></span></p>
                    </td>
                    <td>
                        <p><span><span>123</span></span></p>
                    </td>
                    <td>
                        <p><span><span>623</span></span></p>
                    </td>
                    <td>
                        <p><span><span>95.5%</span></span></p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p><span><span>Laion-768-5m-ip w/o refinement</span></span></p>
                    </td>
                    <td>
                        <p><span><span>14.7</span></span></p>
                    </td>
                    <td>
                        <p><span><span>19.3</span></span></p>
                    </td>
                    <td>
                        <p><span><span>24.3</span></span></p>
                    </td>
                    <td>
                        <p><span><span>27.3</span></span></p>
                    </td>
                    <td>
                        <p><span><span>698</span></span></p>
                    </td>
                    <td>
                        <p><span><span>78.9%</span></span></p>
                    </td>
                </tr>
            </tbody>
        </table>
    </figure>
</div><p>These benchmarks were conducted using a standard Vectorize v2 index, queried with a concurrency of 300 via a Cloudflare Worker binding. The reported latencies reflect those observed by the Worker binding querying the Vectorize index on warm caches, simulating the performance of an existing application with sustained usage.</p><p>Beyond Vectorize's fast query speeds, we believe the combination of Vectorize and Workers AI offers an unbeatable solution for delivering optimal AI application experiences. By running Vectorize close to the source of inference and user interaction, rather than combining AI and vector database solutions across providers, we can significantly minimize end-to-end latency.</p><p>With these improvements, we're excited to announce the general availability of the new Vectorize, which is more powerful, faster, and more cost-effective than ever before.</p>
    <div>
      <h3>Tying it all together: the AI platform for all your inference needs</h3>
      <a href="#tying-it-all-together-the-ai-platform-for-all-your-inference-needs">
        
      </a>
    </div>
    <p>Over the past year, we’ve been committed to building powerful AI products that enable users to build on us. While we are making advancements on each of these individual products, our larger vision is to provide a seamless, integrated experience across our portfolio.</p><p>With Workers AI and AI Gateway, users can easily enable analytics, logging, caching, and rate limiting to their AI application by connecting to AI Gateway directly through a binding in the Workers AI request. We imagine a future where AI Gateway can not only help you create and save datasets to use for fine-tuning your own models with Workers AI, but also seamlessly redeploy them on the same platform. A great AI experience is not just about speed, but also accuracy. While Workers AI ensures fast performance, using it in combination with AI Gateway allows you to evaluate and optimize that performance by monitoring model accuracy and catching issues, like hallucinations or incorrect formats. With AI Gateway, users can test out whether switching to new models in the Workers AI model catalog will deliver more accurate performance and a better user experience.</p><p>In the future, we’ll also be working on tighter integrations between Vectorize and Workers AI, where you can automatically supply context or remember past conversations in an inference call. This cuts down on the orchestration needed to run a <a href="https://www.cloudflare.com/learning/ai/retrieval-augmented-generation-rag/">RAG application</a>, where we can automatically help you make queries to vector databases.</p><p>If we put the three products together, we imagine a world where you can build AI apps with <a href="https://www.cloudflare.com/learning/performance/what-is-observability/">full observability </a>(traces with AI Gateway) and see how the retrieval (Vectorize) and generation (Workers AI) components are working together, enabling you to diagnose issues and improve performance.</p><p>This Birthday Week, we’ve been focused on making sure our individual products are best-in-class, but we’re continuing to invest in building a holistic AI platform within our AI portfolio, but also with the larger Developer Platform Products. Our goal is to make sure that Cloudflare is the simplest, fastest, more powerful place for you to build full-stack AI experiences with all the batteries included.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6nXZn8qwK1tCVVMFbYFf7n/fe538bed97b00ef1b74a05dfd86eb496/image5.png" />
          </figure><p>We’re excited for you to try out all these new features! Take a look at our <a href="https://developers.cloudflare.com/products/?product-group=AI"><u>updated developer docs </u></a>on how to get started and the Cloudflare dashboard to interact with your account.</p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Vectorize]]></category>
            <category><![CDATA[AI Gateway]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Workers AI]]></category>
            <guid isPermaLink="false">2lS9TcgZHa1fubO371mYiv</guid>
            <dc:creator>Michelle Chen</dc:creator>
            <dc:creator>Kathy Liao</dc:creator>
            <dc:creator>Phil Wittig</dc:creator>
            <dc:creator>Meaghan Choi</dc:creator>
        </item>
        <item>
            <title><![CDATA[AI Gateway is generally available: a unified interface for managing and scaling your generative AI workloads]]></title>
            <link>https://blog.cloudflare.com/ai-gateway-is-generally-available/</link>
            <pubDate>Wed, 22 May 2024 13:00:17 GMT</pubDate>
            <description><![CDATA[ AI Gateway is an AI ops platform that provides speed, reliability, and observability for your AI applications. With a single line of code, you can unlock powerful features including rate limiting, custom caching, real-time logs, and aggregated analytics across multiple providers ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5GsB2wwIevC3G2m0PGOAhz/d9eaeea0933d269b39fcda70c22881b7/image4-3.png" />
            
            </figure><p>During Developer Week in April 2024, we announced General Availability of <a href="/workers-ai-ga-huggingface-loras-python-support">Workers AI</a>, and today, we are excited to announce that AI Gateway is Generally Available as well. Since its launch to beta <a href="/announcing-ai-gateway">in September 2023 during Birthday Week</a>, we’ve proxied over 500 million requests and are now prepared for you to use it in production.</p><p>AI Gateway is an AI ops platform that offers a unified interface for managing and scaling your generative AI workloads. At its core, it acts as a proxy between your service and your inference provider(s), regardless of where your model runs. With a single line of code, you can unlock a set of powerful features focused on performance, security, reliability, and observability – think of it as your <a href="https://www.cloudflare.com/learning/network-layer/what-is-the-control-plane/">control plane</a> for your AI ops. And this is just the beginning – we have a roadmap full of exciting features planned for the near future, making AI Gateway the tool for any organization looking to get more out of their AI workloads.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6M6hDWXdRH2rZETQK4UlPe/444269e8d23056252e9e17aa08cef333/image6-1.png" />
            
            </figure>
    <div>
      <h2>Why add a proxy and why Cloudflare?</h2>
      <a href="#why-add-a-proxy-and-why-cloudflare">
        
      </a>
    </div>
    <p>The AI space moves fast, and it seems like every day there is a new model, provider, or framework. Given this high rate of change, it’s hard to keep track, especially if you’re using more than one model or provider. And that’s one of the driving factors behind launching AI Gateway – we want to provide you with a single consistent control plane for all your models and tools, even if they change tomorrow, and then again the day after that.</p><p>We've talked to a lot of developers and organizations building AI applications, and one thing is clear: they want more <a href="https://www.cloudflare.com/learning/performance/what-is-observability/">observability</a>, control, and tooling around their AI ops. This is something many of the AI providers are lacking as they are deeply focused on model development and less so on platform features.</p><p>Why choose Cloudflare for your AI Gateway? Well, in some ways, it feels like a natural fit. We've spent the last 10+ years helping build a better Internet by running one of the largest global networks, helping customers around the world with performance, reliability, and security – Cloudflare is used as a <a href="https://www.cloudflare.com/learning/cdn/glossary/reverse-proxy/">reverse proxy</a> by nearly 20% of all websites. With our expertise, it felt like a natural progression – change one line of code, and we can help with observability, reliability, and control for your AI applications – all in one control plane – so that you can get back to building.</p><p>Here is that one line code change using the OpenAI JS SDK. And check out <a href="https://developers.cloudflare.com/ai-gateway/providers/">our docs</a> to reference other providers, SDKs, and languages.</p>
            <pre><code>import OpenAI from 'openai';

const openai = new OpenAI({
apiKey: 'my api key', // defaults to process.env["OPENAI_API_KEY"]
	baseURL: "https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_slug}/openai"
});</code></pre>
            <p></p>
    <div>
      <h2>What’s included today?</h2>
      <a href="#whats-included-today">
        
      </a>
    </div>
    <p>After talking to customers, it was clear that we needed to focus on some foundational features before moving onto some of the more advanced ones. While we're really excited about what’s to come, here are the key features available in GA today:</p><p><b>Analytics</b>: Aggregate metrics from across multiple providers. See traffic patterns and usage including the number of requests, tokens, and costs over time.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3gFXixQSV6rVUM9V6ew1W4/db974469f45415b7ae0f0af45c30e7f3/pasted-image-0--10-.png" />
            
            </figure><p><b>Real-time logs:</b> Gain insight into requests and errors as you build.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/31KebDSmQfi9lW87mh3oZy/541a90575637dc860e1ef28972958ed4/image8-1.png" />
            
            </figure><p><b>Caching:</b> Enable custom caching rules and use Cloudflare’s cache for repeat requests instead of hitting the original model provider API, helping you save on cost and latency.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2bZw1HaJUP48B3MbXiATpx/0e7ee230a8b1c62e782efd466177fb5f/image1-10.png" />
            
            </figure><p><b>Rate limiting:</b> Control how your application scales by limiting the number of requests your application receives to control costs or prevent abuse.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4icXzN7Z8VuZw17KdzKl2X/60466c7cbe3869c14aa7a7ad90c40159/image5-9.png" />
            
            </figure><p><b>Support for your favorite providers:</b> AI Gateway now natively supports Workers AI plus 10 of the most popular providers, including <a href="https://x.com/CloudflareDev/status/1791204770394648901">Groq and Cohere</a> as of mid-May 2024.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1ORhtmLzCTOKLVrCyhyEZK/53be2a20c4d6bd7dd3cdcd2657ef6455/image2-10.png" />
            
            </figure><p><b>Universal endpoint:</b> In case of errors, improve resilience by defining <a href="https://developers.cloudflare.com/ai-gateway/configuration/fallbacks/">request fallbacks</a> to another model or inference provider.</p>
            <pre><code>curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_slug} -X POST \
  --header 'Content-Type: application/json' \
  --data '[
  {
    "provider": "workers-ai",
    "endpoint": "@cf/meta/llama-2-7b-chat-int8",
    "headers": {
      "Authorization": "Bearer {cloudflare_token}",
      "Content-Type": "application/json"
    },
    "query": {
      "messages": [
        {
          "role": "system",
          "content": "You are a friendly assistant"
        },
        {
          "role": "user",
          "content": "What is Cloudflare?"
        }
      ]
    }
  },
  {
    "provider": "openai",
    "endpoint": "chat/completions",
    "headers": {
      "Authorization": "Bearer {open_ai_token}",
      "Content-Type": "application/json"
    },
    "query": {
      "model": "gpt-3.5-turbo",
      "stream": true,
      "messages": [
        {
          "role": "user",
          "content": "What is Cloudflare?"
        }
      ]
    }
  }
]'</code></pre>
            <p></p>
    <div>
      <h2>What’s coming up?</h2>
      <a href="#whats-coming-up">
        
      </a>
    </div>
    <p>We've gotten a lot of feedback from developers, and there are some obvious things on the horizon such as persistent logs and custom metadata – foundational features that will help unlock the real magic down the road.</p><p>But let's take a step back for a moment and share our vision. At Cloudflare, we believe our platform is much more powerful as a unified whole than as a collection of individual parts. This mindset applied to our AI products means that they should be easy to use, combine, and run in harmony.</p><p>Let's imagine the following journey. You initially onboard onto Workers AI to run inference with the latest open source models. Next, you enable AI Gateway to gain better visibility and control, and start storing persistent logs. Then you want to start tuning your inference results, so you leverage your persistent logs, our prompt management tools, and our built in eval functionality. Now you're making analytical decisions to improve your inference results. With each data driven improvement, you want more. So you implement our feedback API which helps annotate inputs/outputs, in essence building a structured data set. At this point, you are one step away from a one-click fine tune that can be deployed instantly to our global network, and it doesn't stop there. As you continue to collect logs and feedback, you can continuously rebuild your fine tune adapters in order to deliver the best results to your end users.</p><p>This is all just an aspirational story at this point, but this is how we envision the future of AI Gateway and our AI suite as a whole. You should be able to start with the most basic setup and gradually progress into more advanced workflows, all without leaving <a href="https://www.cloudflare.com/ai-solution/">Cloudflare’s AI platform</a>. In the end, it might not look exactly as described above, but you can be sure that we are committed to providing the best AI ops tools to help make Cloudflare the best place for AI.</p>
    <div>
      <h2>How do I get started?</h2>
      <a href="#how-do-i-get-started">
        
      </a>
    </div>
    <p>AI Gateway is available to use today on all plans. If you haven’t yet used AI Gateway, check out our <a href="https://developers.cloudflare.com/ai-gateway/">developer documentation</a> and get started now. AI Gateway’s core features available today are offered for free, and all it takes is a Cloudflare account and one line of code to get started. In the future, more premium features, such as persistent logging and secrets management will be available subject to fees. If you have any questions, reach out on our <a href="http://discord.cloudflare.com">Discord channel</a>.</p> ]]></content:encoded>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Open Source]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[Connectivity Cloud]]></category>
            <category><![CDATA[AI Gateway]]></category>
            <category><![CDATA[AI]]></category>
            <guid isPermaLink="false">3EErej51Xbc8xOYpGL8ggy</guid>
            <dc:creator>Kathy Liao</dc:creator>
            <dc:creator>Michelle Chen</dc:creator>
            <dc:creator>Phil Wittig</dc:creator>
        </item>
    </channel>
</rss>