Pages solves a similar problem, and we were initially inclined to expand our existing architecture and tech stack, which includes a centralized configuration plane built on Go in Kubernetes. We also considered the ways in which the Workers ecosystem has evolved in the four years since Pages launched — we have since launched so many more tools built for use cases just like this!
The distributed nature of Workers offers some advantages over a centralized stack — we can spend less time configuring Kubernetes because Workers automatically handles failover and scaling. Ultimately, we decided to keep using what required no additional work to re-use from Pages (namely, the system for connecting GitHub/GitLab accounts to Cloudflare, and ingesting push events from them), and for the rest build out a new architecture on the Workers platform, with reliability and minimal latency in mind.
We didn’t need to make any changes to the system that handles connections from GitHub/GitLab to Cloudflare and ingesting push events from them. That left us with two systems to build: the configuration plane for users to connect a Worker to a repo, and a build management system to run and monitor builds.
We can begin with our configuration plane, which consists of a simple Client Worker that implements a RESTful API (using Hono) and connects to a PostgreSQL database. It’s in this database that we store build configurations for our users, and through this Worker that users can view and manage their builds.
We considered a more distributed data model (like D1, sharded by account), but ultimately decided that keeping our database in a datacenter more easily fit our use-case. The Workers Builds data model is relational — Workers belong to Cloudflare Accounts, and Builds belong to Workers — and build metadata must be consistent in order to properly manage build queues. We chose to keep our failover-ready database in a centralized datacenter and take advantage of two other Workers products, Smart Placement and Hyperdrive, in order to keep the benefits of a distributed control plane.
\n \n \n
Everything that you see in the Cloudflare Dashboard related to Workers Builds is served by this Worker.
The more challenging problem we faced was how to run and manage user builds effectively. We wanted to support the same experience that we had achieved with Pages, which led to these key requirements:
Builds should be initiated with minimal latency.
The status of a build should be tracked and displayed through its entire lifecycle, starting when a user pushes a commit.
Customer build logs should be stored in a secure, private, and long-lived way.
To solve these problems, we leaned heavily into the technology of Durable Objects (DO).
We created a Build Management Worker with two DO classes: A Scheduler class to manage the scheduling of builds, and a class called BuildBuddy to manage individual builds. We chose to design our system this way for an efficient and scalable system. Since each build is assigned its own build manager DO, its operation won’t ever block other builds or the scheduler, meaning we can start up builds with minimal latency. Below, we dive into each of these Durable Objects classes.
\n \n \n
Scheduler DO
The Scheduler DO class is relatively simple. Using Durable Objects Alarms, it is triggered every second to pull up a list of user build configurations that are ready to be started. For each of those builds, the Scheduler creates an instance of our other DO Class, the Build Buddy.
\n
import { DurableObject } from 'cloudflare:workers'\n\n\nexport class BuildScheduler extends DurableObject {\n state: DurableObjectState\n env: Bindings\n\n\n constructor(ctx: DurableObjectState, env: Bindings) {\n super(ctx, env)\n }\n \n // The DO alarm handler will be called every second to fetch builds\n async alarm(): Promise<void> {\n// set alarm to run again in 1 second\n await this.updateAlarm()\n\n\n const builds = await this.getBuildsToSchedule()\n await this.scheduleBuilds(builds)\n }\n\n\n async scheduleBuilds(builds: Builds[]): Promise<void> {\n // Don't schedule builds, if no builds to schedule\n if (builds.length === 0) return\n\n\n const queue = new PQueue({ concurrency: 6 })\n // Begin running builds\n builds.forEach((build) =>\n queue.add(async () => {\n \t // The BuildBuddy is another DO described more in the next section! \n const bb = getBuildBuddy(this.env, build.build_id)\n await bb.startBuild(build)\n })\n )\n\n\n await queue.onIdle()\n }\n\n\n async getBuildsToSchedule(): Promise<Builds[]> {\n // returns list of builds to schedule\n }\n\n\n async updateAlarm(): Promise<void> {\n// We want to ensure we aren't running multiple alarms at once, so we only set the next alarm if there isn’t already one set. \n const existingAlarm = await this.ctx.storage.getAlarm()\n if (existingAlarm === null) {\n this.ctx.storage.setAlarm(Date.now() + 1000)\n }\n }\n}\n
\n
Build Buddy DO
The Build Buddy DO class is what we use to manage each individual build from the time it begins initializing to when it is stopped. Every build has a buddy for life!
Upon creation of a Build Buddy DO instance, the Scheduler immediately calls startBuild() on the instance. The startBuild() method is responsible for fetching all metadata and secrets needed to run a build, and then kicking off a build on Cloudflare’s container platform (not public yet, but coming soon!).
As the containerized build runs, it reports back to the Build Buddy, sending status updates and logs for the Build Buddy to deal with.
Build status
As a build progresses, it reports its own status back to Build Buddy, sending updates when it has finished initializing, has completed successfully, or been terminated by the user. The Build Buddy is responsible for handling this incoming information from the containerized build, writing status updates to the database (via a Hyperdrive binding) so that users can see the status of their build in the Cloudflare dashboard.
Build logs
A running build generates output logs that are important to store and surface to the user. The containerized build flushes these logs to the Build Buddy every second, which, in turn, stores those logs in DO storage.
The decision to use Durable Object storage here makes it easy to multicast logs to multiple clients efficiently, and allows us to use the same API for both streaming logs and viewing historical logs.
Now that we've gone over the core behavior of the Workers Builds control plane, we'd like to detail a few other features of the Workers platform that we use to improve performance, monitor system health, and troubleshoot customer issues.
While our control plane is distributed in the sense that it can be run across multiple datacenters, to reduce latency costs, we want most requests to be served from locations close to our primary database in the western US.
While a build is running, Build Buddy, a Durable Object, is continuously writing status updates to our database. For the Client and the Build Management API Workers, we enabled Smart Placement with location hints to ensure requests run close to the database.
\n \n \n
This graph shows the reduction in round trip time (RTT) observed for our Worker with Smart Placement turned on.
We needed a logging tool that allows us to aggregate and search across persistent operational logs from our Workers to assist with identifying and troubleshooting issues. We worked with the Workers Observability team to become early adopters of Workers Logs.
Workers Logs worked out of the box, giving us fast and easy to use logs directly within the Cloudflare dashboard. To improve our ability to search logs, we created a tagging library that allows us to easily add metadata like the git tag of the deployed worker that the log comes from, allowing us to filter logs by release.
See a shortened example below for how we handle and log errors on the Client Worker.
// client-worker-app.ts
\n
// The Client Worker is a RESTful API built with Hono\nconst app = new Hono<HonoContext>()\n // This is from the workers-tagged-logger library - first we register the logger\n .use(useWorkersLogger('client-worker-app'))\n // If any error happens during execution, this middleware will ensure we log the error\n .onError(useOnError)\n // routes\n .get(\n '/apiv4/builds',\n async (c) => {\n const { ids } = c.req.query()\n return await getBuildsByIds(c, ids)\n }\n )\n\n\nfunction useOnError(e: Error, c: Context<HonoContext>): Response {\n // Set the project identifier n the error\n logger.setTags({ release: c.env.GIT_TAG })\n \n // Write a log at level 'error'. Can also log 'info', 'log', 'warn', and 'debug'\n logger.error(e)\n return c.json(internal_error.toJSON(), internal_error.statusCode)\n}\n
\n
This setup can lead to the following sample log message from our Workers Log dashboard. You can see the release tag is set on the log.
\n \n \n
We can get a better sense of the impact of the error by adding filters to the Workers Logs view, as shown below. We are able to filter on any of the fields since we’re logging with structured JSON.
Coming soon to Workers Builds is build caching, used to store artifacts of a build for subsequent builds to reuse, such as package dependencies and build outputs. Build caching can speed up customer builds by avoiding the need to redownload dependencies from NPM or to rebuild projects from scratch. The cache itself will be backed by R2 storage.
We were able to build up a great testing story using Vitest and workerd — unit tests, cross-worker integration tests, the works. In the example below, we make use of the runInDurableObject stub from cloudflare:test to test instance methods on the Scheduler DO directly.
// scheduler.spec.ts
\n
import { env, runInDurableObject } from 'cloudflare:test'\nimport { expect, test } from 'vitest'\nimport { BuildScheduler } from './scheduler'\n\n\ntest('getBuildsToSchedule() runs a queued build', async () => {\n // Our test harness creates a single build for our scheduler to pick up\n const { build } = await harness.createBuild()\n\n\n // We create a scheduler DO instance\n const id = env.BUILD_SCHEDULER.idFromName(crypto.randomUUID())\n const stub = env.BUILD_SCHEDULER.get(id)\n await runInDurableObject(stub, async (instance: BuildScheduler) => {\n expect(instance).toBeInstanceOf(BuildScheduler)\n\n\n// We check that the scheduler picks up 1 build\n const builds = await instance.getBuildsToSchedule()\n expect(builds.length).toBe(1)\n\t\n// We start the build, which should mark it as running\n await instance.scheduleBuilds(builds)\n })\n\n\n // Check that there are no more builds to schedule\n const queuedBuilds = ...\n expect(queuedBuilds.length).toBe(0)\n})\n
\n
We use SELF.fetch() from cloudflare:test to run integration tests on our Client Worker, as shown below. This integration test covers our Hono endpoint and database queries made by the Client Worker in retrieving the metadata of a build.
// builds_api.test.ts
\n
import { env, SELF } from 'cloudflare:test'\n \nit('correctly selects a single build', async () => {\n // Our test harness creates a randomized build to test with\n const { build } = await harness.createBuild()\n\n\n // We send a request to the Client Worker itself to fetch the build metadata\n const getBuild = await SELF.fetch(\n `https://example.com/builds/${build1.build_uuid}`,\n {\n method: 'GET',\n headers: new Headers({\n Authorization: `Bearer JWT`,\n 'content-type': 'application/json',\n }),\n }\n )\n\n\n // We expect to receive a 200 response from our request and for the \n // build metadata returned to match that of the random build that we created\n expect(getBuild.status).toBe(200)\n const getBuildV4Resp = await getBuild.json()\n const buildResp = getBuildV4Resp.result\n expect(buildResp).toBeTruthy()\n expect(buildResp).toEqual(build)\n})\n
\n
These tests run on the same runtime that Workers run on in production, meaning we have greater confidence that any code changes will behave as expected when they go live.
We use the technology underlying the Workers Analytics Engine to collect all of the metrics for our system. We set up Grafana dashboards to display these metrics.
JavaScript-native RPC was added to Workers in April of 2024, and it’s pretty magical. In the scheduler code example above, we call startBuild() on the BuildBuddy DO from the Scheduler DO. Without RPC, we would need to stand up routes on the BuildBuddy fetch() handler for the Scheduler to trigger with a fetch request. With RPC, there is almost no boilerplate — all we need to do is call a method on a class.
\n
const bb = getBuildBuddy(this.env, build.build_id)\n\n\n// Starting a build without RPC 😢\nawait bb.fetch('http://do/api/start_build', {\n method: 'POST',\n body: JSON.stringify(build),\n})\n\n\n// Starting a build with RPC 😸\nawait bb.startBuild(build)\n
By using Workers and Durable Objects, we were able to build a complex and distributed system that is easy to understand and is easily scalable.
It’s been a blast for our team to build on top of the very platform that we work on, something that would have been much harder to achieve on Workers just a few years ago. We believe in being Customer Zero for our own products — to identify pain points firsthand and to continuously improve the developer experience by applying them to our own use cases. It was fulfilling to have our needs as developers met by other teams and then see those tools quickly become available to the rest of the world — we were collaborators and internal testers for Workers Logs and private network support for Hyperdrive (both released on Birthday Week), and the soon to be released container platform.
Opportunities to build complex applications on the Developer Platform have increased in recent years as the platform has matured and expanded product offerings for more use cases. We hope that Workers Builds will be yet another tool in the Workers toolbox that enables developers to spend less time thinking about configuration and more time writing code.
Want to try it out? Check out the docs to learn more about how to deploy your first project with Workers Builds.
"],"published_at":[0,"2024-10-31T13:00+00:00"],"updated_at":[0,"2024-12-12T00:01:49.103Z"],"feature_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/le4lYyHpoBKwuVJbiH4eW/793952ca6fa5a152d029526991db31f4/BLOG-2594_1.png"],"tags":[1,[[0,{"id":[0,"3JAY3z7p7An94s6ScuSQPf"],"name":[0,"Developer Platform"],"slug":[0,"developer-platform"]}],[0,{"id":[0,"4HIPcb68qM0e26fIxyfzwQ"],"name":[0,"Developers"],"slug":[0,"developers"]}],[0,{"id":[0,"6hbkItfupogJP3aRDAq6v8"],"name":[0,"Cloudflare Workers"],"slug":[0,"workers"]}]]],"relatedTags":[0],"authors":[1,[[0,{"name":[0,"Serena Shah-Simpson"],"slug":[0,"serena"],"bio":[0,null],"profile_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2cLxQm0wbdpyGirexVcdpr/2f6ae5b415dc4515cfffc2a4090bb9d3/serena.PNG"],"location":[0,null],"website":[0,null],"twitter":[0,null],"facebook":[0,null]}],[0,{"name":[0,"Jacob Hands"],"slug":[0,"jacob-hands"],"bio":[0,null],"profile_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1u48WVfES8uNb77aB2z9bk/9bfef685adbdef1298e57959119d5931/jacob-hands.jpeg"],"location":[0,null],"website":[0,null],"twitter":[0,"@jachands"],"facebook":[0,null]}],[0,{"name":[0,"Natalie Rogers"],"slug":[0,"natalie"],"bio":[0,null],"profile_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7yLLjP9Y2l0cJdBrPMfrre/73f6d2b7a9c41cdf2f3dc9a5016d3a8d/natalie.png"],"location":[0,null],"website":[0,null],"twitter":[0,null],"facebook":[0,null]}]]],"meta_description":[0,"Workers Builds, an integrated CI/CD pipeline for the Workers platform, recently launched in open beta. We walk through how we built this product on Cloudflare’s Developer Platform."],"primary_author":[0,{}],"localeList":[0,{"name":[0,"blog-english-only"],"enUS":[0,"English for Locale"],"zhCN":[0,"No Page for Locale"],"zhHansCN":[0,"No Page for Locale"],"zhTW":[0,"No Page for Locale"],"frFR":[0,"No Page for Locale"],"deDE":[0,"No Page for Locale"],"itIT":[0,"No Page for Locale"],"jaJP":[0,"No Page for Locale"],"koKR":[0,"No Page for Locale"],"ptBR":[0,"No Page for Locale"],"esLA":[0,"No Page for Locale"],"esES":[0,"No Page for Locale"],"enAU":[0,"No Page for Locale"],"enCA":[0,"No Page for Locale"],"enIN":[0,"No Page for Locale"],"enGB":[0,"No Page for Locale"],"idID":[0,"No Page for Locale"],"ruRU":[0,"No Page for Locale"],"svSE":[0,"No Page for Locale"],"viVN":[0,"No Page for Locale"],"plPL":[0,"No Page for Locale"],"arAR":[0,"No Page for Locale"],"nlNL":[0,"No Page for Locale"],"thTH":[0,"No Page for Locale"],"trTR":[0,"No Page for Locale"],"heIL":[0,"No Page for Locale"],"lvLV":[0,"No Page for Locale"],"etEE":[0,"No Page for Locale"],"ltLT":[0,"No Page for Locale"]}],"url":[0,"https://blog.cloudflare.com/workers-builds-integrated-ci-cd-built-on-the-workers-platform"],"metadata":[0,{"title":[0,"Workers Builds: integrated CI/CD built on the Workers platform"],"description":[0,"Workers Builds, an integrated CI/CD pipeline for the Workers platform, recently launched in open beta. We walk through how we built this product on Cloudflare’s Developer Platform."],"imgPreview":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2chCtOBT6VDwfihl8rpHmt/1539f97126503921530ff0fd61d343aa/Workers_Builds-_integrated_CI_CD_built_on_the_Workers_platform-OG.png"]}]}],"translations":[0,{"posts.by":[0,"By"],"footer.gdpr":[0,"GDPR"],"lang_blurb1":[0,"This post is also available in {lang1}."],"lang_blurb2":[0,"This post is also available in {lang1} and {lang2}."],"lang_blurb3":[0,"This post is also available in {lang1}, {lang2} and {lang3}."],"footer.press":[0,"Press"],"header.title":[0,"The Cloudflare Blog"],"search.clear":[0,"Clear"],"search.filter":[0,"Filter"],"search.source":[0,"Source"],"footer.careers":[0,"Careers"],"footer.company":[0,"Company"],"footer.support":[0,"Support"],"footer.the_net":[0,"theNet"],"search.filters":[0,"Filters"],"footer.our_team":[0,"Our team"],"footer.webinars":[0,"Webinars"],"page.more_posts":[0,"More posts"],"posts.time_read":[0,"{time} min read"],"search.language":[0,"Language"],"footer.community":[0,"Community"],"footer.resources":[0,"Resources"],"footer.solutions":[0,"Solutions"],"footer.trademark":[0,"Trademark"],"header.subscribe":[0,"Subscribe"],"footer.compliance":[0,"Compliance"],"footer.free_plans":[0,"Free plans"],"footer.impact_ESG":[0,"Impact/ESG"],"posts.follow_on_X":[0,"Follow on X"],"footer.help_center":[0,"Help center"],"footer.network_map":[0,"Network Map"],"header.please_wait":[0,"Please Wait"],"page.related_posts":[0,"Related posts"],"search.result_stat":[0,"Results {search_range} of {search_total} for {search_keyword}"],"footer.case_studies":[0,"Case Studies"],"footer.connect_2024":[0,"Connect 2024"],"footer.terms_of_use":[0,"Terms of Use"],"footer.white_papers":[0,"White Papers"],"footer.cloudflare_tv":[0,"Cloudflare TV"],"footer.community_hub":[0,"Community Hub"],"footer.compare_plans":[0,"Compare plans"],"footer.contact_sales":[0,"Contact Sales"],"header.contact_sales":[0,"Contact Sales"],"header.email_address":[0,"Email Address"],"page.error.not_found":[0,"Page not found"],"footer.developer_docs":[0,"Developer docs"],"footer.privacy_policy":[0,"Privacy Policy"],"footer.request_a_demo":[0,"Request a demo"],"page.continue_reading":[0,"Continue reading"],"footer.analysts_report":[0,"Analyst reports"],"footer.for_enterprises":[0,"For enterprises"],"footer.getting_started":[0,"Getting Started"],"footer.learning_center":[0,"Learning Center"],"footer.project_galileo":[0,"Project Galileo"],"pagination.newer_posts":[0,"Newer Posts"],"pagination.older_posts":[0,"Older Posts"],"posts.social_buttons.x":[0,"Discuss on X"],"search.icon_aria_label":[0,"Search"],"search.source_location":[0,"Source/Location"],"footer.about_cloudflare":[0,"About Cloudflare"],"footer.athenian_project":[0,"Athenian Project"],"footer.become_a_partner":[0,"Become a partner"],"footer.cloudflare_radar":[0,"Cloudflare Radar"],"footer.network_services":[0,"Network services"],"footer.trust_and_safety":[0,"Trust & Safety"],"header.get_started_free":[0,"Get Started Free"],"page.search.placeholder":[0,"Search Cloudflare"],"footer.cloudflare_status":[0,"Cloudflare Status"],"footer.cookie_preference":[0,"Cookie Preferences"],"header.valid_email_error":[0,"Must be valid email."],"search.result_stat_empty":[0,"Results {search_range} of {search_total}"],"footer.connectivity_cloud":[0,"Connectivity cloud"],"footer.developer_services":[0,"Developer services"],"footer.investor_relations":[0,"Investor relations"],"page.not_found.error_code":[0,"Error Code: 404"],"search.autocomplete_title":[0,"Insert a query. Press enter to send"],"footer.logos_and_press_kit":[0,"Logos & press kit"],"footer.application_services":[0,"Application services"],"footer.get_a_recommendation":[0,"Get a recommendation"],"posts.social_buttons.reddit":[0,"Discuss on Reddit"],"footer.sse_and_sase_services":[0,"SSE and SASE services"],"page.not_found.outdated_link":[0,"You may have used an outdated link, or you may have typed the address incorrectly."],"footer.report_security_issues":[0,"Report Security Issues"],"page.error.error_message_page":[0,"Sorry, we can't find the page you are looking for."],"header.subscribe_notifications":[0,"Subscribe to receive notifications of new posts:"],"footer.cloudflare_for_campaigns":[0,"Cloudflare for Campaigns"],"header.subscription_confimation":[0,"Subscription confirmed. Thank you for subscribing!"],"posts.social_buttons.hackernews":[0,"Discuss on Hacker News"],"footer.diversity_equity_inclusion":[0,"Diversity, equity & inclusion"],"footer.critical_infrastructure_defense_project":[0,"Critical Infrastructure Defense Project"]}]}" ssr="" client="load" opts="{"name":"PostCard","value":true}" await-children="">
Workers Builds, an integrated CI/CD pipeline for the Workers platform, recently launched in open beta. We walk through how we built this product on Cloudflare’s Developer Platform....
Today we’re excited to announce that over the next year we will be working to bring together the best traits and attributes you know and love from each product into one powerful platform! ...