This section is left intentionally blank to allow locomotive enthusiasts who want to find the train on their own to do so first.
Remember my comment about straces being gibberish? Actually, I was kidding. So there are a few ways to make it more manageable, and with experience you'll learn which system calls to pay attention to, such as execve, chdir, open, read, fork, and signals, and which ones to skim over, such as mprotect,mmap, and futex.
Since I'm writing this account after the fact, let's cheat a little and assume I was super smart and zeroed in on execve correctly on the first try:
\n
🌈 ~ zgrep execve trace.log.gz | head\nexecve("/home/yew/.nvm/versions/node/v18.20.6/bin/yarn", ["yarn", "test", "steam-regulator"], 0x7ffdff573148 /* 72 vars */) = 0\nexecve("/home/yew/.pyenv/shims/node", ["node", "/home/yew/.nvm/versions/node/v18"..., "test", "steam-regulator"], 0x7ffd64f878c8 /* 72 vars */) = -1 ENOENT (No such file or directory)\nexecve("/home/yew/.pyenv/bin/node", ["node", "/home/yew/.nvm/versions/node/v18"..., "test", "steam-regulator"], 0x7ffd64f878c8 /* 72 vars */) = -1 ENOENT (No such file or directory)\nexecve("/home/yew/repos/secrets/bin/node", ["node", "/home/yew/.nvm/versions/node/v18"..., "test", "steam-regulator"], 0x7ffd64f878c8 /* 72 vars */) = -1 ENOENT (No such file or directory)\nexecve("/home/yew/.nvm/versions/node/v18.20.6/bin/node", ["node", "/home/yew/.nvm/versions/node/v18"..., "test", "steam-regulator"], 0x7ffd64f878c8 /* 72 vars */) = 0\n[pid 49307] execve("/bin/sh", ["/bin/sh", "-c", "backstage-cli repo test resource"...], 0x3d17d6d0 /* 156 vars */ <unfinished ...>\n[pid 49307] <... execve resumed>) \t= 0\n[pid 49308] execve("/home/yew/cloudflare/repos/backstage/node_modules/.bin/backstage-cli", ["backstage-cli", "repo", "test", "steam-regulator"], 0x5e7ef80051d8 /* 156 vars */ <unfinished ...>\n[pid 49308] <... execve resumed>) \t= 0\n[pid 49308] execve("/tmp/yarn--1742459197616-0.9027914591640542/node", ["node", "/home/yew/cloudflare/repos/backs"..., "repo", "test", "steam-regulator"], 0x7ffcc18af270 /* 156 vars */) = 0\n🌈 ~ zgrep execve trace.log.gz | wc -l\n2254\n
\n
Phew, 2,000 is a lot of execves . Let's get the unique ones, plus their counts:
Have you spotted the Steam Locomotive yet? I spotted it immediately because this is My Own System and Surely This Means I Am Perfectly Aware Of Everything That Is Installed Unlike, er, node_modules.
sl is actually afun little joke program from 1993 that plays on users' tendencies to make a typo on ls. When sl runs, it clears your terminal to make way for an animated steam locomotive to come chugging through.
When I first saw that Jest was running sl so many times, my first thought was to ask my colleague if sl is a valid command on his Mac, and of course it is not. After all, which serious engineer would stuff their machine full of silly commands like sl, gti, cowsay, or toilet ? The next thing I tried was to rename sl to something else, and sure enough all my problems disappeared: yarn test started working perfectly.
\n
\n
So what does Jest have to do with Steam Locomotives?
Nothing, that's what. The whole affair is an unfortunate naming clash between sl the Steam Locomotive and sl theSapling CLI. Jest wanted sl the source control system, but ended up getting steam-rolled by sl the Steam Locomotive.
I don't know. Actually I'm not sure if a forked child executing sl still has a terminal anymore, but the travel time of the train does depend on the terminal width. The wider it is, the longer it takes:
\n
🌈 ~ tput cols\n425\n🌈 ~ time sl\nsl 0.19s user 0.06s system 1% cpu 20.629 total\n🌈 ~ tput cols\n58\n🌈 ~ time sl \nsl 0.03s user 0.01s system 0% cpu 5.695 total
\n
So the first thing I tried was to run yarn test in a ridiculously narrow terminal and see what happens:
\n
Determin\ning test\n suites \nto run..\n. \n \n ● Test\n suite f\nailed to\n run \n \nthrown: \n[Error] \n \nerror Co\nmmand fa\niled wit\nh exit c\node 1. \ninfo Vis\nit https\n://yarnp\nkg.com/e\nn/docs/c\nli/run f\nor docum\nentation\n about t\nhis comm\nand. \nyarn tes\nt 1.92s\n user 0.\n67s syst\nem 9% cp\nu 27.088\n total \n🌈 back\nstage [m\naster] t\nput cols\n \n8
\n
Alas, the terminal width doesn't affect jest at all. Jest callssl via execa so let's mock that up locally:
\n
🌈 choochoo cat runSl.mjs \nimport {execa} from 'execa';\nconst { stdout } = await execa('tput', ['cols']);\nconsole.log('terminal colwidth:', stdout);\nawait execa('sl', ['root']);\n🌈 choochoo time node runSl.mjs\nterminal colwidth: 80\nnode runSl.mjs 0.21s user 0.06s system 4% cpu 6.730 total
\n
So execa uses the default terminal width of 80, which takes the train 6.7 seconds to cross. And 27 seconds divided by 6.7 is awfully close to 4. So is Jest running sl 4 times? Let's do a poor man's bpftrace by hooking into sl like so:
The logs were emitted for about 26.35 seconds, which is close to 27. It probably crashed just as wave4 was reporting back. And each wave lasts about 6.7 seconds, right on the money with manual measurement.
\n
\n
So why is Jest running sl in 4 waves? Why did it crash at the start of the 5th wave?
From the results we can see that the 5 workers are busy executing sl root, which corresponds to the getRoot() function in jest-change-files/sl.ts
\n
2025-03-21 05:50:22.663263304 started: root at /home/yew/cloudflare/repos/backstage/packages/app/src\n2025-03-21 05:50:22.665550470 started: root at /home/yew/cloudflare/repos/backstage/packages/backend/src\n2025-03-21 05:50:22.667988509 started: root at /home/yew/cloudflare/repos/backstage/plugins/access/src\n2025-03-21 05:50:22.671781519 started: root at /home/yew/cloudflare/repos/backstage/plugins/backstage-components/src\n2025-03-21 05:50:22.673690514 started: root at /home/yew/cloudflare/repos/backstage/plugins/backstage-entities/src\n2025-03-21 05:50:29.247573899 started: root at /home/yew/cloudflare/repos/backstage/plugins/catalog-types-common/src\n2025-03-21 05:50:29.251173536 started: root at /home/yew/cloudflare/repos/backstage/plugins/cross-connects/src\n2025-03-21 05:50:29.255263605 started: root at /home/yew/cloudflare/repos/backstage/plugins/cross-connects-backend/src\n2025-03-21 05:50:29.257293780 started: root at /home/yew/cloudflare/repos/backstage/plugins/pingboard-backend/src\n2025-03-21 05:50:29.260285783 started: root at /home/yew/cloudflare/repos/backstage/plugins/resource-insights/src\n2025-03-21 05:50:35.823374079 started: root at /home/yew/cloudflare/repos/backstage/plugins/scaffolder-backend-module-gaia/src\n2025-03-21 05:50:35.825418386 started: root at /home/yew/cloudflare/repos/backstage/plugins/scaffolder-backend-module-r2/src\n2025-03-21 05:50:35.829963172 started: root at /home/yew/cloudflare/repos/backstage/plugins/security-scorecard-dash/src\n2025-03-21 05:50:35.832597778 started: root at /home/yew/cloudflare/repos/backstage/plugins/slo-directory/src\n2025-03-21 05:50:35.834631869 started: root at /home/yew/cloudflare/repos/backstage/plugins/software-excellence-dashboard/src\n2025-03-21 05:50:42.404063080 started: root at /home/yew/cloudflare/repos/backstage/plugins/teamcity/src
\n
The 16 entries here correspond neatly to the 16 rootDirs configured in Jest for Cloudflare's backstage. We have 5 trains, and we want to visit 16 stations so let's do some simple math. 16/5.0 = 3.2 which means our trains need to go back and forth 4 times at a minimum to cover them all.
Let's go back to the very start of our journey. The original [Error] thrown was actuallyfrom here and after modifying node_modules/jest-changed-files/index.js, I found that the error is shortMessage: 'Command failed with ENAMETOOLONG: sl status...' and the reason why became clear when I interrogated Jest about what it thinks the repos are.
While the git repo is what you'd expect, the sl "repo" looks amazingly like a train wreck in motion:
"],"published_at":[0,"2025-04-02T14:00+01:00"],"updated_at":[0,"2025-04-02T13:00:03.425Z"],"feature_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1yA1TgNlIUZwtZ4bIL39EJ/0bfc765dae00213eca7e48a337c8178e/image1.png"],"tags":[1,[[0,{"id":[0,"2UVIYusJwlvsmPYl2AvSuR"],"name":[0,"Deep Dive"],"slug":[0,"deep-dive"]}],[0,{"id":[0,"383iv0UQ6Lp0GZwOAxGq2p"],"name":[0,"Linux"],"slug":[0,"linux"]}],[0,{"id":[0,"3JAY3z7p7An94s6ScuSQPf"],"name":[0,"Developer Platform"],"slug":[0,"developer-platform"]}],[0,{"id":[0,"4HIPcb68qM0e26fIxyfzwQ"],"name":[0,"Developers"],"slug":[0,"developers"]}]]],"relatedTags":[0],"authors":[1,[[0,{"name":[0,"Yew Leong"],"slug":[0,"yew-leong"],"bio":[0],"profile_image":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/658l52gIu4kyDjwnJCelUt/d0a7c86def68692d50d9b4a0d6fc2f18/_tmp_mini_magick20221116-43-2dcplr.jpg"],"location":[0],"website":[0],"twitter":[0],"facebook":[0]}]]],"meta_description":[0,"Yarn tests fail consistently at the 27-second mark. The usual suspects are swiftly eliminated to no avail. A deep dive is taken to comb through traces, only to be derailed into an unexpected crash investigation."],"primary_author":[0,{}],"localeList":[0,{"name":[0,"blog-english-only"],"enUS":[0,"English for Locale"],"zhCN":[0,"No Page for Locale"],"zhHansCN":[0,"No Page for Locale"],"zhTW":[0,"No Page for Locale"],"frFR":[0,"No Page for Locale"],"deDE":[0,"No Page for Locale"],"itIT":[0,"No Page for Locale"],"jaJP":[0,"No Page for Locale"],"koKR":[0,"No Page for Locale"],"ptBR":[0,"No Page for Locale"],"esLA":[0,"No Page for Locale"],"esES":[0,"No Page for Locale"],"enAU":[0,"No Page for Locale"],"enCA":[0,"No Page for Locale"],"enIN":[0,"No Page for Locale"],"enGB":[0,"No Page for Locale"],"idID":[0,"No Page for Locale"],"ruRU":[0,"No Page for Locale"],"svSE":[0,"No Page for Locale"],"viVN":[0,"No Page for Locale"],"plPL":[0,"No Page for Locale"],"arAR":[0,"No Page for Locale"],"nlNL":[0,"No Page for Locale"],"thTH":[0,"No Page for Locale"],"trTR":[0,"No Page for Locale"],"heIL":[0,"No Page for Locale"],"lvLV":[0,"No Page for Locale"],"etEE":[0,"No Page for Locale"],"ltLT":[0,"No Page for Locale"]}],"url":[0,"https://blog.cloudflare.com/yarn-test-suffers-strange-derailment"],"metadata":[0,{"title":[0,"A steam locomotive from 1993 broke my yarn test"],"description":[0,"Yarn tests fail consistently at the 27-second mark. The usual suspects are swiftly eliminated to no avail. A deep dive is taken to comb through traces, only to be derailed into an unexpected crash investigation."],"imgPreview":[0,"https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4y36mkMs8GlYflk6MMmLnh/42840cf34748ac9b6619c5d47704db10/A_steam_locomotive_from_1993_broke_my_yarn_test-OG.png"]}]}],"translations":[0,{"posts.by":[0,"By"],"footer.gdpr":[0,"GDPR"],"lang_blurb1":[0,"This post is also available in {lang1}."],"lang_blurb2":[0,"This post is also available in {lang1} and {lang2}."],"lang_blurb3":[0,"This post is also available in {lang1}, {lang2} and {lang3}."],"footer.press":[0,"Press"],"header.title":[0,"The Cloudflare Blog"],"search.clear":[0,"Clear"],"search.filter":[0,"Filter"],"search.source":[0,"Source"],"footer.careers":[0,"Careers"],"footer.company":[0,"Company"],"footer.support":[0,"Support"],"footer.the_net":[0,"theNet"],"search.filters":[0,"Filters"],"footer.our_team":[0,"Our team"],"footer.webinars":[0,"Webinars"],"page.more_posts":[0,"More posts"],"posts.time_read":[0,"{time} min read"],"search.language":[0,"Language"],"footer.community":[0,"Community"],"footer.resources":[0,"Resources"],"footer.solutions":[0,"Solutions"],"footer.trademark":[0,"Trademark"],"header.subscribe":[0,"Subscribe"],"footer.compliance":[0,"Compliance"],"footer.free_plans":[0,"Free plans"],"footer.impact_ESG":[0,"Impact/ESG"],"posts.follow_on_X":[0,"Follow on X"],"footer.help_center":[0,"Help center"],"footer.network_map":[0,"Network Map"],"header.please_wait":[0,"Please Wait"],"page.related_posts":[0,"Related posts"],"search.result_stat":[0,"Results {search_range} of {search_total} for {search_keyword}"],"footer.case_studies":[0,"Case Studies"],"footer.connect_2024":[0,"Connect 2024"],"footer.terms_of_use":[0,"Terms of Use"],"footer.white_papers":[0,"White Papers"],"footer.cloudflare_tv":[0,"Cloudflare TV"],"footer.community_hub":[0,"Community Hub"],"footer.compare_plans":[0,"Compare plans"],"footer.contact_sales":[0,"Contact Sales"],"header.contact_sales":[0,"Contact Sales"],"header.email_address":[0,"Email Address"],"page.error.not_found":[0,"Page not found"],"footer.developer_docs":[0,"Developer docs"],"footer.privacy_policy":[0,"Privacy Policy"],"footer.request_a_demo":[0,"Request a demo"],"page.continue_reading":[0,"Continue reading"],"footer.analysts_report":[0,"Analyst reports"],"footer.for_enterprises":[0,"For enterprises"],"footer.getting_started":[0,"Getting Started"],"footer.learning_center":[0,"Learning Center"],"footer.project_galileo":[0,"Project Galileo"],"pagination.newer_posts":[0,"Newer Posts"],"pagination.older_posts":[0,"Older Posts"],"posts.social_buttons.x":[0,"Discuss on X"],"search.icon_aria_label":[0,"Search"],"search.source_location":[0,"Source/Location"],"footer.about_cloudflare":[0,"About Cloudflare"],"footer.athenian_project":[0,"Athenian Project"],"footer.become_a_partner":[0,"Become a partner"],"footer.cloudflare_radar":[0,"Cloudflare Radar"],"footer.network_services":[0,"Network services"],"footer.trust_and_safety":[0,"Trust & Safety"],"header.get_started_free":[0,"Get Started Free"],"page.search.placeholder":[0,"Search Cloudflare"],"footer.cloudflare_status":[0,"Cloudflare Status"],"footer.cookie_preference":[0,"Cookie Preferences"],"header.valid_email_error":[0,"Must be valid email."],"search.result_stat_empty":[0,"Results {search_range} of {search_total}"],"footer.connectivity_cloud":[0,"Connectivity cloud"],"footer.developer_services":[0,"Developer services"],"footer.investor_relations":[0,"Investor relations"],"page.not_found.error_code":[0,"Error Code: 404"],"search.autocomplete_title":[0,"Insert a query. Press enter to send"],"footer.logos_and_press_kit":[0,"Logos & press kit"],"footer.application_services":[0,"Application services"],"footer.get_a_recommendation":[0,"Get a recommendation"],"posts.social_buttons.reddit":[0,"Discuss on Reddit"],"footer.sse_and_sase_services":[0,"SSE and SASE services"],"page.not_found.outdated_link":[0,"You may have used an outdated link, or you may have typed the address incorrectly."],"footer.report_security_issues":[0,"Report Security Issues"],"page.error.error_message_page":[0,"Sorry, we can't find the page you are looking for."],"header.subscribe_notifications":[0,"Subscribe to receive notifications of new posts:"],"footer.cloudflare_for_campaigns":[0,"Cloudflare for Campaigns"],"header.subscription_confimation":[0,"Subscription confirmed. Thank you for subscribing!"],"posts.social_buttons.hackernews":[0,"Discuss on Hacker News"],"footer.diversity_equity_inclusion":[0,"Diversity, equity & inclusion"],"footer.critical_infrastructure_defense_project":[0,"Critical Infrastructure Defense Project"]}]}" ssr="" client="load" opts="{"name":"PostCard","value":true}" await-children="">
Yarn tests fail consistently at the 27-second mark. The usual suspects are swiftly eliminated. A deep dive is taken to comb through traces, only to be derailed into an unexpected crash investigation....
The Linux kernel can produce a hung task warning. Searching the Internet and the kernel docs, you can find a brief explanation that the process is stuck in the uninterruptible state....
Multi-Path TCP (MPTCP) leverages multiple network interfaces, like Wi-Fi and cellular, to provide seamless mobility for more reliable connectivity. While promising, MPTCP is still in its early stages,...
This post illustrates some of the Linux Kernel features, which are helping us to keep our production systems more secure. We will deep dive into how they work and why you may consider enabling them as well...
This is our story of what we learned about the connect() implementation for TCP in Linux. Both its strong and weak points. How connect() latency changes under pressure, and how to open connection so that the syscall latency is deterministic and time-bound...
The initial posts are dedicated to the x86 architecture. Since then, the fleet of our working machines has expanded to include a large and growing number of ARM CPUs. This time we’ll repeat this exercise for the aarch64 architecture....
If you run your software on Linux, the Linux Kernel itself can satisfy all your cryptographic needs! In this post we will explore Linux Crypto API for user applications and try to understand its pros and cons...
If I navigate to https://blog.cloudflare.com/, my browser will connect to a remote TCP address from the local IP address assigned to my machine, and a randomly chosen local TCP port. What happens if I then decide to head to another site?...
In the Linux kernel before 6.1.6, a NULL pointer dereference bug in the traffic control subsystem allows an unprivileged user to trigger a denial of service (system crash) via a crafted traffic control configuration that is set up with "tc qdisc" and "tc class" commands....
A race condition in the virtual ethernet driver of the Linux kernel led to occasional packet content corruptions, which resulted in unwanted packet drops by one of our DDoS mitigation systems. This blogpost describes the thought process and technique we used to debug this complex...
Many leaks happen because of software bugs and security vulnerabilities. In this post we will learn how the Linux kernel can help protect cryptographic keys from a whole class of potential security vulnerabilities: memory access violations....
In this blog post I'll share my journey deep into the Linux networking stack, trying to understand the memory and window management of the receiving side of a TCP connection...
A crash in a development version of flowtrackd (the daemon that powers our Advanced TCP Protection) highlighted that libxdp (and specifically the AF_XDP part) was not Linux network namespace aware. ...
Learn how to patch Linux security vulnerabilities without rebooting the hardware and how to tighten the security of your Linux operating system with eBPF Linux Security Module...
Chances are you might have heard of io_uring. It first appeared in Linux 5.1, back in 2019, and was advertised as the new API for asynchronous I/O. Its goal was to be an alternative to the deemed-to-be-broken-beyond-repair AIO, the “old” asynchronous I/O API...
Often programmers have assumptions that turn out, to their surprise, to be invalid. From my experience this happens a lot. Every API, technology or system can be abused beyond its limits and break in a miserable way...