Observability

Announcing Workers automatic tracing, now in open beta

2025-10-28

Cloudflare Workers' support for automatic tracing is now in open beta! Export traces to any OpenTelemetry-compatible provider for deeper application observability -- no code changes required...

Announcing Workers automatic tracing, now in open beta

Performance measurements… and the people who love them

2025-05-20

Internet Performance Latency Open Source Observability TTFB

Developers have a gut-felt understanding for performance, but that intuition breaks down when systems reach Cloudflare’s scale....

Kevin Guthrie

Moving Baselime from AWS to Cloudflare: simpler architecture, improved performance, over 80% lower cloud costs

2024-10-31

Observability Cloudflare Workers Developer Platform Performance

Post-acquisition, we migrated Baselime from AWS to the Cloudflare Developer Platform and in the process, we improved query times, simplified data ingestion, and now handle far more events, all while cutting costs. Here’s how we built a modern, high-performing observability platform on Cloudflare’s network. ...

Boris Tane

Adopting OpenTelemetry for our logging pipeline

2024-06-03

Observability Engineering

Recently, Cloudflare's Observability team undertook an effort to migrate our existing syslog-ng backed logging infrastructure to instead being backed by OpenTelemetry Collectors. In this post, we detail the process that we undertook, and the difficulties we faced along the way...

Reclaiming CPU for free with Go's Profile Guided Optimization

2024-05-14

Observability Performance

Golang 1.20 introduced support for Profile Guided Optimization (PGO) to the go compiler. This post covers the process we created for experimenting with PGO at Cloudflare, and measuring the CPU savings...

Colin Douch

April 05, 2024 3:50 PM

Cloudflare acquires Baselime to expand serverless application observability capabilities

Today, we’re thrilled to announce that Cloudflare has acquired Baselime, a serverless observability company...

Boris Tane
, Rita Kozlov

, Developers

April 04, 2024 1:05 PM

New tools for production safety — Gradual deployments, Source maps, Rate Limiting, and new SDKs

Today we are announcing five updates that put more power in your hands – Gradual Deployments, Source mapped stack traces in Tail Workers, a new Rate Limiting API, brand-new API SDKs, and updates to Durable Objects – each built with mission-critical production services in mind...

Tanushree Sharma
, Jacob Bednarz

, SDK

March 29, 2024 1:00 PM

Minimizing on-call burnout through alerts observability

Learn how Cloudflare used open-source tools to enhance alert observability, leading to increased resilience and improved on-call team well-being...

Monika Singh

Observability

, Developers

, Developer Platform

, Prometheus

, Alertmanager

January 24, 2024 2:00 PM

Introducing Foundations - our open source Rust service foundation library

Foundations is a foundational Rust library, designed to help scale programs for distributed, production-grade systems...

Ivan Nikulin

Open Source

, Rust

, Observability

, Security

, Oxy

January 08, 2024 2:00 PM

An overview of Cloudflare's logging pipeline

In this post, we’re going to go over what that looks like, how we achieve high availability, and how we meet our Service Level Objectives (SLOs) while shipping close to a million log lines per second...

Colin Douch

Observability

, Logs

September 28, 2023 1:00 PM

Cloudflare Integrations Marketplace introduces three new partners: Sentry, Momento and Turso

We introduced integrations with Supabase, PlanetScale, Neon and Upstash. Today, we are thrilled to introduce our newest additions to Cloudflare’s Integrations Marketplace – Sentry, Turso and Momento...

Tanushree Sharma

, Developers

March 03, 2023 2:00 PM

How Cloudflare runs Prometheus at scale

Here at Cloudflare we run over 900 instances of Prometheus with a total of around 4.9 billion time series. Operating such a large Prometheus deployment doesn’t come without challenges . In this blog post we’ll cover some of the issues we hit and how we solved them...

Lukasz Mierzwa

Prometheus

, Observability

, Open Source

, Deep Dive

January 24, 2023 2:00 PM

Intelligent, automatic restarts for unhealthy Kafka consumers

At Cloudflare, we take steps to ensure we are resilient against failure at all levels of our infrastructure. This includes Kafka, which we use for critical workflows such as sending time-sensitive emails and alerts....

Chris Shepherd
, Andrea Medda

Kafka

, Observability

, Go

, Kubernetes

September 28, 2022 1:00 PM

Monitor your own network with free network flow analytics from Cloudflare

Cloudflare is excited to announce that we are releasing a free version of Magic Networking Monitoring (previously called Flow Based Monitoring). Magic Network Monitoring receives network flow data from a customer’s router(s) and provides network traffic analytics via Cloudflare’s...

Chris Draper

Birthday Week

, Free

, Magic Network Monitoring

, Network

, Observability

May 19, 2022 3:39 PM

Monitoring our monitoring: how we validate our Prometheus alert rules

Pint is a tool we developed to validate our Prometheus alerting rules and ensure they are always working...

Lukasz Mierzwa

Monitoring

, Prometheus

, Observability

, Speed & Reliability

April 13, 2021 1:00 PM

Expanding the Cloudflare Workers Observability Ecosystem

Cloudflare adds Data Dog, Honeycomb, New Relic, Sentry, Splunk, and Sumologic as observability partners to the Cloudflare Workers Ecosystem...

Steven Pack
, Erwin van der Koogh

Developer Week

, Developers

, Cloudflare Workers

, Partners

, Observability

January 14, 2021 12:00 PM

Soar: Simulation for Observability, reliAbility, and secuRity

In this article, we will discuss one of the techniques we use to fight such software complexity: simulations. Simulations are basically system tests that run with synthesized customer traffic and applications....

Yan Zhai

Observability

, Security

, Speed & Reliability

The Cloudflare Blog

Observability

Announcing Workers automatic tracing, now in open beta

Performance measurements… and the people who love them

Moving Baselime from AWS to Cloudflare: simpler architecture, improved performance, over 80% lower cloud costs

Adopting OpenTelemetry for our logging pipeline

Reclaiming CPU for free with Go's Profile Guided Optimization

MORE POSTS

Cloudflare acquires Baselime to expand serverless application observability capabilities

New tools for production safety — Gradual deployments, Source maps, Rate Limiting, and new SDKs

Minimizing on-call burnout through alerts observability

Introducing Foundations - our open source Rust service foundation library

An overview of Cloudflare's logging pipeline

Cloudflare Integrations Marketplace introduces three new partners: Sentry, Momento and Turso

How Cloudflare runs Prometheus at scale

Intelligent, automatic restarts for unhealthy Kafka consumers

Monitor your own network with free network flow analytics from Cloudflare

Monitoring our monitoring: how we validate our Prometheus alert rules

Expanding the Cloudflare Workers Observability Ecosystem

Soar: Simulation for Observability, reliAbility, and secuRity