Anchoring Trust: A Hardware Secure Boot Story

As a security company, we pride ourselves on finding innovative ways to protect our platform to, in turn, protect the data of our customers. Part of this approach is implementing progressive methods in protecting our hardware at scale. While we have blogged about how we address security threats from application to memory, the attacks on hardware, as well as firmware, have increased substantially. The data cataloged in the National Vulnerability Database (NVD) has shown the frequency of hardware and firmware-level vulnerabilities rising year after year.

Technologies like secure boot, common in desktops and laptops, have been ported over to the server industry as a method to combat firmware-level attacks and protect a device’s boot integrity. These technologies require that you create a trust ‘anchor’, an authoritative entity for which trust is assumed and not derived. A common trust anchor is the system Basic Input/Output System (BIOS) or the Unified Extensible Firmware Interface (UEFI) firmware.

While this ensures that the device boots only signed firmware and operating system bootloaders, does it protect the entire boot process? What protects the BIOS/UEFI firmware from attacks?

The Boot Process

Before we discuss how we secure our boot process, we will first go over how we boot our machines.

The above image shows the following sequence of events:

After powering on the system (through a baseboard management controller (BMC) or physically pushing a button on the system), the system unconditionally executes the UEFI firmware residing on a flash chip on the motherboard.
UEFI performs some hardware and peripheral initialization and executes the Preboot Execution Environment (PXE) code, which is a small program that boots an image over the network and usually resides on a flash chip on the network card.
PXE sets up the network card, and downloads and executes a small program bootloader through an open source boot firmware called iPXE.
iPXE loads a script that automates a sequence of commands for the bootloader to know how to boot a specific operating system (sometimes several of them). In our case, it loads our Linux kernel, initrd (this contains device drivers which are not directly compiled into the kernel), and a standard Linux root filesystem. After loading these components, the bootloader executes and hands off the control to the kernel.
Finally, the Linux kernel loads any additional drivers it needs and starts applications and services.

UEFI Secure Boot

Our UEFI secure boot process is fairly straightforward, albeit customized for our environments. After loading the UEFI firmware from the bootloader, an initialization script defines the following variables:

Platform Key (PK): It serves as the cryptographic root of trust for secure boot, giving capabilities to manipulate and/or validate the other components of the secure boot framework.

Trusted Database (DB): Contains a signed (by platform key) list of hashes of all PCI option ROMs, as well as a public key, which is used to verify the signature of the bootloader and the kernel on boot.

These variables are respectively the master platform public key, which is used to sign all other resources, and an allow list database, containing other certificates, binary file hashes, etc. In default secure boot scenarios, Microsoft keys are used by default. At Cloudflare we use our own, which makes us the root of trust for UEFI:

But, by setting our trust anchor in the UEFI firmware, what attack vectors still exist?

UEFI Attacks

As stated previously, firmware and hardware attacks are on the rise. It is clear from the figure below that firmware-related vulnerabilities have increased significantly over the last 10 years, especially since 2017, when the hacker community started attacking the firmware on different platforms:

This upward trend, coupled with recent malware findings in UEFI, shows that trusting firmware is becoming increasingly problematic.

By tainting the UEFI firmware image, you poison the entire boot trust chain. The ability to trust firmware integrity is important beyond secure boot. For example, if you can't trust the firmware not to be compromised, you can't trust things like trusted platform module (TPM) measurements to be accurate, because the firmware is itself responsible for doing these measurements (e.g a TPM is not an on-path security mechanism, but instead it requires firmware to interact and cooperate with). Firmware may be crafted to extend measurements that are accepted by a remote attestor, but that don't represent what's being locally loaded. This could cause firmware to have a questionable measured boot and remote attestation procedure.

If we can’t trust firmware, then hardware becomes our last line of defense.

Hardware Root of Trust

Early this year, we made a series of blog posts on why we chose AMD EPYC processors for our Gen X servers. With security in mind, we started turning on features that were available to us and set forth the plan of using AMD silicon as a Hardware Root of Trust (HRoT).

Platform Secure Boot (PSB) is AMD’s implementation of hardware-rooted boot integrity. Why is it better than UEFI firmware-based root of trust? Because it is intended to assert, by a root of trust anchored in the hardware, the integrity and authenticity of the System ROM image before it can execute. It does so by performing the following actions:

Authenticates the first block of BIOS/UEFI prior to releasing x86 CPUs from reset.
Authenticates the System Read-Only Memory (ROM) contents on each boot, not just during updates.
Moves the UEFI Secure Boot trust chain to immutable hardware.

This is accomplished by the AMD Platform Security Processor (PSP), an ARM Cortex-A5 microcontroller that is an immutable part of the system on chip (SoC). The PSB consists of two components:

On-chip Boot ROM

Embeds a SHA384 hash of an AMD root signing key
Verifies and then loads the off-chip PSP bootloader located in the boot flash

Off-chip Boot**loader******

Locates the PSP directory table that allows the PSP to find and load various images
Authenticates first block of BIOS/UEFI code
Releases CPUs after successful authentication

The PSP secures the On-chip Boot ROM code, loads the off-chip PSP firmware into PSP static random access memory (SRAM) after authenticating the firmware, and passes control to it.
The Off-chip Bootloader (BL) loads and specifies applications in a specific order (whether or not the system goes into a debug state and then a secure EFI application binary interface to the BL)
The system continues initialization through each bootloader stage.
If each stage passes, then the UEFI image is loaded and the x86 cores are released.

Now that we know the booting steps, let’s build an image.

Build Process

Public Key Infrastructure

Before the image gets built, a public key infrastructure (PKI) is created to generate the key pairs involved for signing and validating signatures:

Our original device manufacturer (ODM), as a trust extension, creates a key pair (public and private) that is used to sign the first segment of the BIOS (private key) and validates that segment on boot (public key).

On AMD’s side, they have a key pair that is used to sign (the AMD root signing private key) and certify the public key created by the ODM. This is validated by AMD’s root signing public key, which is stored as a hash value (RSASSA-PSS: SHA-384 with 4096-bit key is used as the hashing algorithm for both message and mask generation) in SPI-ROM.

Private keys (both AMD and ODM) are stored in hardware security modules.

Because of the way the PKI mechanisms are built, the system cannot be compromised if only one of the keys is leaked. This is an important piece of the trust hierarchy that is used for image signing.

Certificate Signing Request

Once the PKI infrastructure is established, a BIOS signing key pair is created, together with a certificate signing request (CSR). Creating the CSR uses known common name (CN) fields that many are familiar with:

countryName
stateOrProvinceName
localityName
organizationName

In addition to the fields above, the CSR will contain a serialNumber field, a 32-bit integer value represented in ASCII HEX format that encodes the following values:

PLATFORM_VENDOR_ID: An 8-bit integer value assigned by AMD for each ODM.
PLATFORM_MODEL_ID: A 4-bit integer value assigned to a platform by the ODM.
BIOS_KEY_REVISION_ID: is set by the ODM encoding a 4-bit key revision as unary counter value.
DISABLE_SECURE_DEBUG: Fuse bit that controls whether secure debug unlock feature is disabled permanently.
DISABLE_AMD_BIOS_KEY_USE: Fuse bit that controls if the BIOS, signed by an AMD key, (with vendor ID == 0) is permitted to boot on a CPU with non-zero Vendor ID.
DISABLE_BIOS_KEY_ANTI_ROLLBACK: Fuse bit that controls whether BIOS key anti-rollback feature is enabled.

Remember these values, as we’ll show how we use them in a bit. Any of the DISABLE values are optional, but recommended based on your security posture/comfort level.

AMD, upon processing the CSR, provides the public part of the BIOS signing key signed and certified by the AMD signing root key as a RSA Public Key Token file (.stkn) format.

Putting It All Together

The following is a step-by-step illustration of how signed UEFI firmware is built:

The ODM submits their public key used for signing Cloudflare images to AMD.
AMD signs this key using their RSA private key and passes it back to ODM.
The AMD public key and the signed ODM public key are part of the final BIOS SPI image.
The BIOS source code is compiled and various BIOS components (PEI Volume, Driver eXecution Environment (DXE) volume, NVRAM storage, etc.) are built as usual.
The PSP directory and BIOS directory are built next. PSP directory and BIOS directory table points to the location of various firmware entities.
The ODM builds the signed BIOS Root of Trust Measurement (RTM) signature based on the blob of BIOS PEI volume concatenated with BIOS Directory header, and generates the digital signature of this using the private portion of ODM signing key. The SPI location for signed BIOS RTM code is finally updated with this signature blob.
Finally, the BIOS binaries, PSP directory, BIOS directory and various firmware binaries are combined to build the SPI BIOS image.

Enabling Platform Secure Boot

Platform Secure Boot is enabled at boot time with a PSB-ready firmware image. PSB is configured using a region of one-time programmable (OTP) fuses, specified for the customer. OTP fuses are on-chip non-volatile memory (NVM) that permits data to be written to memory only once. There is NO way to roll the fused CPU back to an unfused one.

Enabling PSB in the field will go through two steps: fusing and validating.

Fusing: Fuse the values assigned in the serialNumber field that was generated in the CSR
Validating: Validate the fused values and the status code registers

If validation is successful, the BIOS RTM signature is validated using the ODM BIOS signing key, PSB-specific registers (MP0_C2P_MSG_37 and MP0_C2P_MSG_38) are updated with the PSB status and fuse values, and the x86 cores are released

If validation fails, the registers above are updated with the PSB error status and fuse values, and the x86 cores stay in a locked state.

Let’s Boot!

With a signed image in hand, we are ready to enable PSB on a machine. We chose to deploy this on a few machines that had an updated, unsigned AMI UEFI firmware image, in this case version 2.16. We use a couple of different firmware update tools, so, after a quick script, we ran an update to change the firmware version from 2.16 to 2.18C (the signed image):

. $sudo ./UpdateAll.sh
Bin file name is ****.218C

BEGIN

+---------------------------------------------------------------------------+
|                 AMI Firmware Update Utility v5.11.03.1778                 |      
|                 Copyright (C)2018 American Megatrends Inc.                |                       
|                         All Rights Reserved.                              |
+---------------------------------------------------------------------------+
Reading flash ............... done
FFS checksums ......... ok
Check RomLayout ........ ok.
Erasing Boot Block .......... done
Updating Boot Block ......... done
Verifying Boot Block ........ done
Erasing Main Block .......... done
Updating Main Block ......... done
Verifying Main Block ........ done
Erasing NVRAM Block ......... done
Updating NVRAM Block ........ done
Verifying NVRAM Block ....... done
Erasing NCB Block ........... done
Updating NCB Block .......... done
Verifying NCB Block ......... done

Process completed.

After the update completed, we rebooted:

After a successful install, we validated that the image was correct via the sysfs information provided in the dmidecode output:

Testing

With a signed image installed, we wanted to test that it worked, meaning: what if an unauthorized user installed their own firmware image? We did this by downgrading the image back to an unsigned image, 2.16. In theory, the machine shouldn’t boot as the x86 cores should stay in a locked state. After downgrading, we rebooted and got the following:

This isn’t a powered down machine, but the result of booting with an unsigned image.

Flashing back to a signed image is done by running the same flashing utility through the BMC, so we weren’t bricked. Nonetheless, the results were successful.

Naming Convention

Our standard UEFI firmware images are alphanumeric, making it difficult to distinguish (by name) the difference between a signed and unsigned image (v2.16A vs v2.18C), for example. There isn’t a remote attestation capability (yet) to probe the PSB status registers or to store these values by means of a signature (e.g. TPM quote). As we transitioned to PSB, we wanted to make this easier to determine by adding a specific suffix: -sig that we could query in userspace. This would allow us to query this information via Prometheus. Changing the file name alone wouldn’t do it, so we had to make the following changes to reflect a new naming convention for signed images:

Update filename
Update BIOS version for setup menu
Update post message
Update SMBIOS type 0 (BIOS version string identifier)

Signed images now have a -sig suffix:

~$ sudo dmidecode -t0
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.3.0 present.
# SMBIOS implementations newer than version 3.2.0 are not
# fully supported by this version of dmidecode.

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
	Vendor: American Megatrends Inc.
	Version: V2.20-sig
	Release Date: 09/29/2020
	Address: 0xF0000
	Runtime Size: 64 kB
	ROM Size: 16 MB

Conclusion

Finding weaknesses in firmware is a challenge that many attackers have taken on. Attacks that physically manipulate the firmware used for performing hardware initialization during the booting process can invalidate many of the common secure boot features that are considered industry standard. By implementing a hardware root of trust that is used for code signing critical boot entities, your hardware becomes a 'first line of defense' in ensuring that your server hardware and software integrity can derive trust through cryptographic means.

What’s Next?

While this post discussed our current, AMD-based hardware platform, how will this affect our future hardware generations? One of the benefits of working with diverse vendors like AMD and Ampere (ARM) is that we can ensure they are baking in our desired platform security by default (which we’ll speak about in a future post), making our hardware security outlook that much brighter ?.

The Cloudflare Blog