The Cloudflare Blog

Pairings in CIRCL

Armando Faz-Hernández — Wed, 13 Oct 2021 12:59:30 GMT

In 2019, we announced the release of CIRCL, an open-source cryptographic library written in Go that provides optimized implementations of several primitives for key exchange and digital signatures. We are pleased to announce a major update of our library: we have included more packages for elliptic curve-based cryptography (ECC), pairing-based cryptography, and quantum-resistant algorithms.

All of these packages are the foundation of work we’re doing on bringing the benefits of cutting edge research to Cloudflare. In the past we’ve experimented with post-quantum algorithms, used pairings to keep keys safe around the world, and implemented advanced elliptic curves. Now we’re continuing that work, and sharing the foundation with everyone.

In this blog post we’re going to focus on pairing-based cryptography and give you a brief overview of some properties that make this topic so pleasant. If you are not so familiar with elliptic curves, we recommend this primer on ECC.

Otherwise, let’s get ready, pairings have arrived!

What are pairings?

Elliptic curve cryptography enables an efficient instantiation of several cryptographic applications: public-key encryption, signatures, zero-knowledge proofs, and many other more exotic applications like oblivious transfer and OPRFs. With all of those applications you might wonder what is the additional value that pairings offer? To see that, we need first to understand the basic properties of an elliptic curve system, and from that we can highlight the big gap that pairings have.

Conventional elliptic curve systems work with a single group $ \mathbb{G} $: the points of an elliptic curve $E$. In this group, usually denoted additively, we can add the points $P$ and $Q$ and get another point on the curve $R=P+Q$; also, we can multiply a point $P$ by an integer scalar $k$ and by repeatedly doing$$ kP = \underbrace{P+P+\dots+P}_{k \text{ terms}} $$This operation is known as scalar multiplication, which resembles exponentiation, and there are efficient algorithms for this operation. But given the point $Q=kP$, and $P$, it is very hard for an adversary that doesn’t know $k$ to find it. This is the Elliptic Curve Discrete Logarithm problem (ECDLP).

Now we show a property of scalar multiplication that can help us to understand the properties of pairings.

Scalar Multiplication is a Linear Map

Note the following equivalences:

$ (a+b)P = aP + bP $

$ b (aP) = a (bP) $.

These are very useful properties for many protocols: for example, the last identity allows Alice and Bob to arrive at the same value when following the Diffie-Hellman key-agreement protocol.

But while point addition and scalar multiplication are nice, it’s also useful to be able to multiply points: if we had a point $P$ and $aP$ and $bP$, getting $abP$ out would be very cool and let us do all sorts of things. Unfortunately Diffie-Hellman would immediately be insecure, so we can’t get what we want.

Guess what? Pairings provide an efficient, useful sort of intermediary point multiplication.

It’s intermediate multiplication because although the operation takes two points as operands, the result of a pairing is not a point, but an element of a different group; thus, in a pairing there are more groups involved and all of them must contain the same number of elements.

Pairing is defined as $$ e \colon\; \mathbb{G}_1 \times \mathbb{G}_2 \rightarrow \mathbb{G}_T $$Groups $\mathbb{G}_1$ and $\mathbb{G}_2$ contain points of an elliptic curve $E$. More specifically, they are the $r$-torsion points, for a fixed prime $r$. Some pairing instances fix $\mathbb{G}_1=\mathbb{G}_2$, but it is common to use disjoint sets for efficiency reasons. The third group $\mathbb{G}_T$ has notable differences. First, it is written multiplicatively, unlike the other two groups. $\mathbb{G}_T$ is not the set of points on an elliptic curve. It’s instead a subgroup of the multiplicative group over some larger finite field. It contains the elements that satisfy $x^r=1$, better known as the $r$-roots of unity.

Source: “Pairings are not dead, just resting” by Diego Aranha ECC-2017 (inspired by Avanzi’s talk at SPEED-2009).

While every elliptic curve has a pairing, very few have ones that are efficiently computable. Those that do, we call them pairing-friendly curves.

The Pairing Operation is a Bilinear Map

What makes pairings special is that $e$ is a bilinear map. Yes, the linear property of the scalar multiplication is present twice, one per group. Let’s see the first linear map.

For points $P, Q, R$ and scalars $a$ and $b$ we have:

$ e(P+Q, R) = e(P, R) * e(Q, R) $

$ e(aP, Q) = e(P, Q)^a $.

So, a scalar $a$ acting in the first operand as $aP$, finds its way out and escapes from the input of the pairing and appears in the output of the pairing as an exponent in $\mathbb{G}_T$. The same linear map is observed for the second group:

$ e(P, Q+R) = e(P, Q) * e(P, R) $

$ e(P, bQ) = e(P, Q)^b $.

Hence, the pairing is bilinear. We will see below how this property becomes useful.

Can bilinear pairings help solving ECDLP?

The MOV (by Menezes, Okamoto, and Vanstone) attack reduces the discrete logarithm problem on elliptic curves to finite fields. An attacker with knowledge of $kP$ and public points $P$ and $Q$ can recover $k$ by computing:

$ g = e(P, Q) $,

$ g_k = e(kP, Q) = e(P, Q)^k $,

$ k = \log_g(g_k) = \log_g(g^k) $.

Note that the discrete logarithm to be solved was moved from $\mathbb{G}_1$ to $\mathbb{G}_T$. So an attacker must ensure that the discrete logarithm is easier to solve in $\mathbb{G}_T$, and surprisingly, for some curves this is the case.

Fortunately, pairings do not present a threat for standard curves (such as the NIST curves or Curve25519) because these curves are constructed in such a way that $\mathbb{G}_T$ gets very large, which makes the pairing operation not efficient anymore.

This attacking strategy was one of the first applications of pairings in cryptanalysis as a tool to solve the discrete logarithm. Later, more people noticed that the properties of pairings are so useful, and can be used constructively to do cryptography. One of the fascinating truisms of cryptography is that one person's sledgehammer is another person's brick: while pairings yield a generic attack strategy for the ECDLP problem, it can also be used as a building block in a ton of useful applications.

Applications of Pairings

In the 2000s decade, a large wave of research works were developed aimed at applying pairings to many practical problems. An iconic pairing-based system was created by Antoine Joux, who constructed a one-round Diffie-Hellman key exchange for three parties.

Let’s see first how a three-party Diffie-Hellman is done without pairings. Alice, Bob and Charlie want to agree on a shared key, so they compute, respectively, $aP$, $bP$ and $cP$ for a public point P. Then, they send to each other the points they computed. So Alice receives $cP$ from Charlie and sends $aP$ to Bob, who can then send $baP$ to Charlie and get $acP$ from Alice and so on. After all this is done, they can all compute $k=abcP$. Can this be performed in a single round trip?

Two round Diffie-Hellman without pairings.

One round Diffie-Hellman with pairings.

The 3-party Diffie-Hellman protocol needs two communication rounds (on the top), but with the use of pairings a one-round trip protocol is possible.

Affirmative! Antoine Joux showed how to agree on a shared secret in a single round of communication. Alice announces $aP$, gets $bP$ and $cP$ from Bob and Charlie respectively, and then computes $k= (bP, cP)^a$. Likewise Bob computes $e(aP,cP)^b$ and Charlie does $e(aP,bP)^c$. It’s not difficult to convince yourself that all these values are equivalent, just by looking at the bilinear property.

$ e(bP,cP)^a = e(aP,cP)^b = e(aP,bP)^c = e(P,P)^{abc}$

With pairings we’ve done in one round what would otherwise take two.

Another application in cryptography addresses a problem posed by Shamir in 1984: does there exist an encryption scheme in which the public key is an arbitrary string? Imagine if your public key was your email address. It would be easy to remember and certificate authorities and certificate management would be unnecessary.

A solution to this problem came some years later, in 2001, and is the Identity-based Encryption (IBE) scheme proposed by Boneh and Franklin, which uses bilinear pairings as the main tool.

Nowadays, pairings are used for the zk-SNARKS that make Zcash an anonymous currency, and are also used in drand to generate public-verifiable randomness. Pairings and the compact, aggregatable BLS signatures are used in Ethereum. We have used pairings to build Geo Key Manager: pairings let us implement a compact broadcast and negative broadcast scheme that together make Geo Key Manager work.

In order to make these schemes, we have to implement pairings, and to do that we need to understand the mathematics behind them.

Where do pairings come from?

In order to deeply understand pairings we must understand the interplay of geometry and arithmetic, and the origins of the group’s law. The starting point is the concept of a divisor, a formal combination of points on the curve.

$ D = \sum n_i P_i $

The sum of all the coefficients $n_i$ is the degree of the divisor. If we have a function on the curve that has poles and zeros, we can count them with multiplicity to get a principal divisor. Poles are counted as negative, while zeros as positive. For example if we take the projective line, the function $x$ has the divisor $(0)-(\infty)$.

The degree of a divisor is the sum of its coefficients. All principal divisors have degree equal to $0$. The group of degree zero divisors modulo the principal divisors is the Jacobian. This means that we take all the degree zero divisors, and freely add or subtract principle divisors, constructing an abelian variety called the Jacobian.

Until now our constructions have worked for any curve. Elliptic curves have a special property: since a line intersects the curve in three points, it’s always possible to turn an element of the Jacobian into one of the form $(P)-(O)$ for a point $P$. This is where the addition law of elliptic curves comes from.

The relation between addition on an elliptic curve and the geometry of the curve. Source file ECClines.svg

Given a function $f$ we can evaluate it on a divisor $D=\sum n_i P_i$ by taking the product $\prod f(P_i)^{n_i}$. And if two functions $f$ and $g$ have disjoint divisors, we have the Weil duality:

$ f(\text{div}(g)) = g(\text{div}(f)) $,

The existence of Weil duality is what gives us the bilinear map we seek. Given an $r$-torsion point $T$ we have a function $f$ whose divisor is $r(T)-r(O)$. We can write down an auxiliary function $g$ such that $f(rP)=g^r(P)$ for any $P$. We then get a pairing by taking:

$e_r(S,T)=\frac{g(X+S)}{g(X)}$.

The auxiliary point $X$ is any point that makes the numerator and denominator defined.

In practice, the pairing we have defined above, the Weil pairing, is little used. It was historically the first pairing and is extremely important in the mathematics of elliptic curves, but faster alternatives based on more complicated underlying mathematics are used today. These faster pairings have different $\mathbb{G}_1$ and $\mathbb{G}_2$, while the Weil pairing made them the same.

Shift in parameters

As we saw earlier, the discrete logarithm problem can be attacked either on the group of elliptic curve points or on the third group (an extension of a prime field) whichever is weaker. This is why the parameters that define a pairing must balance the security of the three groups.

Before 2015 a good balance between the extension degree, the size of the prime field, and the security of the scheme was achieved by the family of Barreto-Naehrig (BN) curves. For 128 bits of security, BN curves use an extension of degree 12, and have a prime of size 256 bits; as a result they are an efficient choice for implementation.

A breakthrough for pairings occurred in 2015 when Kim and Barbescu published a result that accelerated the attacks in finite fields. This resulted in increasing the size of fields to comply with standard security levels. Just as short hashes like MD5 got depreciated as they became insecure and $2^{64}$ was no longer enough, and RSA-1024 was replaced with RSA-2048, we regularly change parameters to deal with improved attacks.

For pairings this implied the use of larger primes for all the groups. Roughly speaking, the previous 192-bit security level becomes the new 128-bit level after this attack. Also, this shift in parameters brings the family of Barreto-Lynn-Scott (BLS) curves to the stage because pairings on BLS curves are faster than BN in this new setting. Hence, currently BLS curves using an extension of degree 12, and primes of around 384 bits provide the equivalent to 128 bit security.

The IETF draft (currently in preparation) draft-irtf-cfrg-pairing-friendly-curves specifies secure pairing-friendly elliptic curves. It includes parameters for BN and BLS families of curves. It also targets different security levels to provide crypto agility for some applications relying on pairing-based cryptography.

Implementing Pairings in Go

Historically, notable examples of software libraries implementing pairings include PBC by Ben Lynn, Miracl by Michael Scott, and Relic by Diego Aranha. All of them are written in C/C++ and some ports and wrappers to other languages exist.

In the Go standard library we can find the golang.org/x/crypto/bn256 package by Adam Langley, an implementation of a pairing using a BN curve with 256-bit prime. Our colleague Brendan McMillion built github.com/cloudflare/bn256 that dramatically improves the speed of pairing operations for that curve. See the RWC-2018 talk to see our use case of pairing-based cryptography. This time we want to go one step further, and we started looking for alternative pairing implementations.

Although one can find many libraries that implement pairings, our goal is to rely on one that is efficient, includes protection against side channels, and exposes a flexible API oriented towards applications that permit generality, while avoiding common security pitfalls. This motivated us to include pairings in CIRCL. We followed best practices on secure code development and we want to share with you some details about the implementation.

We started by choosing a pairing-friendly curve. Due to the attack previously mentioned, the BN256 curve does not meet the 128-bit security level. Thus there is a need for using stronger curves. Such a stronger curve is the BLS12-381 curve that is widely used in the zk-SNARK protocols and short signature schemes. Using this curve allows us to make our Go pairing implementation interoperable with other implementations available in other programming languages, so other projects can benefit from using CIRCL too.

This code snippet tests the linearity property of pairings and shows how easy it is to use our library.

import (
    "crypto/rand"
    "fmt"
    e "github.com/cloudflare/circl/ecc/bls12381"
)

func ExamplePairing() {
    P,  Q := e.G1Generator(), e.G2Generator()
    a,  b := new(e.Scalar), new(e.Scalar)
    aP, bQ := new(e.G1), new(e.G2)
    ea, eb := new(e.Gt), new(e.Gt)

    a.Random(rand.Reader)
    b.Random(rand.Reader)

    aP.ScalarMult(a, P)
    bQ.ScalarMult(b, Q)

    g  := e.Pair( P, Q)
    ga := e.Pair(aP, Q)
    gb := e.Pair( P,bQ)

    ea.Exp(g, a)
    eb.Exp(g, b)
    linearLeft := ea.IsEqual(ga) // e(P,Q)^a == e(aP,Q)
    linearRight:= eb.IsEqual(gb) // e(P,Q)^b == e(P,bQ)

    fmt.Print(linearLeft && linearRight)
    // Output: true
}

We applied several optimizations that allowed us to improve on performance and security of the implementation. In fact, as the parameters of the curve are fixed, some other optimizations become easier to apply; for example, the code for prime field arithmetic and the towering construction for extension fields as we detail next.

Formally-verified arithmetic using fiat-crypto

One of the more difficult parts of a cryptography library to implement correctly is the prime field arithmetic. Typically people specialize it for speed, but there are many tedious constraints on what the inputs to operations can be to ensure correctness. Vulnerabilities have happened when people get it wrong, across many libraries. However, this code is perfect for machines to write and check. One such tool is fiat-crypto.

Using fiat-crypto to generate the prime field arithmetic means that we have a formal verification that the code does what we need. The fiat-crypto tool is invoked in this script, and produces Go code for addition, subtraction, multiplication, and squaring over the 381-bit prime field used in the BLS12-381 curve. Other operations are not covered, but those are much easier to check and analyze by hand.

Another advantage is that it avoids relying on the generic big.Int package, which is slow, frequently spills variables to the heap causing dynamic memory allocations, and most importantly, does not run in constant-time. Instead, the code produced is straight-line code, no branches at all, and relies on the math/bits package for accessing machine-level instructions. Automated code generation also means that it’s easier to apply new techniques to all primitives.

Tower Field Arithmetic

In addition to prime field arithmetic, say integers modulo a prime number p, a pairing also requires high-level arithmetic operations over extension fields.

To better understand what an extension field is, think of the analogous case of going from the reals to the complex numbers: the operations are referred as usual, there exist addition, multiplication and division $(+, -, \times, /)$, however they are computed quite differently.

The complex numbers are a quadratic extension over the reals, so imagine a two-level house. The first floor is where the real numbers live, however, they cannot access the second floor by themselves. On the other hand, the complex numbers can access the entire house through the use of a staircase. The equation $f(x)=x^2+1$ was not solvable over the reals, but is solvable over the complex numbers, since they have the number $i^2=1$. And because they have the number $i$, they also have to have numbers like $3i$ and $5+i$ that solve other equations that weren’t solvable over the reals either. This second story has given the roots of the polynomials a place to live.

Algebraically we can view the complex numbers as $\mathbb{R}[x]/(x^2+1)$, the space of polynomials where we consider $x^2=-1$. Given a polynomial like $x^3+5x+1$, we can turn it into $4x+1$, which is another way of writing $1+4i$. In this new field $x^2+1=0$ holds automatically, and we have added both $x$ as a root of the polynomial we picked. This process of writing down a field extension by adding a root of a polynomial works over any field, including finite fields.

Following this analogy, one can construct not a house but a tower, say a $k=12$ floor building where the ground floor is the prime field $\mathbb{F}_p$. The reason to build such a large tower is because we want to host VIP guests: namely a group called $\mu_r$, the $r$-roots of unity. There are exactly $r$ members and they behave as an (algebraic) group, i.e. for all $x,y \in \mu_r$, it follows $x*y \in \mu_r$ and $x^r = 1$.

One particularity is that making operations on the top-floor can be costly. Assume that whenever an operation is needed in the top-floor, members who live on the main floor are required to provide their assistance. Hence an operation on the top-floor needs to go all the way down to the ground floor. For this reason, our tower needs more than a staircase, it needs an efficient way to move between the levels, something even better than an elevator.

What if we use portals? Imagine anyone on the twelfth floor using a portal to go down immediately to the sixth floor, and then use another one connecting the sixth to the second floor, and finally another portal connecting to the ground floor. Thus, one can get faster to the first floor rather than descending through a long staircase.

The building analogy can be used to understand and construct a tower of finite fields. We use only some extensions to build a twelfth extension from the (ground) prime field $\mathbb{F}_p$.

$\mathbb{F}_{p}$ ⇒ $\mathbb{F}_{p^2}$ ⇒ $\mathbb{F}_{p^6}$ ⇒ $\mathbb{F}_{p^{12}}$

In fact, the extension of finite fields is as follows:

$\mathbb{F}_{p^2}$ is built as polynomials in $\mathbb{F}_p[u]$ reduced modulo $u^2+1=0$.
$\mathbb{F}_{p^6}$ is built as polynomials in $\mathbb{F}_{p^2}[v]$ reduced modulo $v^3+u+1=0$.
$\mathbb{F}_{p^{12}}$ is built as polynomials in $\mathbb{F}_{p^6}[w]$ reduced modulo $w^2+v=0$, or as polynomials in $\mathbb{F}_{p^4}[w]$ reduced modulo $w^3+v=0$.

The portals here are the polynomials used as modulus, as they allow us to move from one extension to the other.

Different constructions for higher extensions have an impact on the number of operations performed. Thus, we implemented the latter tower field for $\mathbb{F}_{p^{12}}$ as it results in a lower number of operations. The arithmetic operations are quite easy to implement and manually verify, so at this level formal verification is not as effective as in the case of prime field arithmetic. However, having an automated tool that generates code for this arithmetic would be useful for developers not familiar with the internals of field towering. The fiat-crypto tool keeps track of this idea [Issue 904, Issue 851].

Now, we describe more details about the main core operations of a bilinear pairing.

The Miller loop and the final exponentiation

The pairing function we implemented is the optimal r-ate pairing, which is defined as:

$ e(P,Q) = f_Q(P)^{\text{exp}} $

That is the construction of a function $f$ based on $Q$, then evaluated on a point $P$, and the result of that is raised to a specific power. The efficient function evaluation is performed using “the Miller loop”, which is an algorithm devised by Victor Miller and has a similar structure to a double-and-add algorithm for scalar multiplication.

After having computed $f_Q(P)$ this value is an element of $\mathbb{F}_{p^{12}}$, however it is not yet an $r$-root of unity; in order to do so, the final exponentiation accomplishes this task. Since the exponent is constant for each curve, special algorithms can be tuned for it.

One interesting acceleration opportunity presents itself: in the Miller loop the elements of $\mathbb{F}_{p^{12}}$ that we have to multiply by are special — as polynomials, they have no linear term and their constant term lives in $\mathbb{F}_{p^{2}}$. We created a specialized multiplication that avoids multiplications where the input has to be zero. This specialization accelerated the pairing computation by 12%.

So far, we have described how the internal operations of a pairing are performed. There are still some other functions floating around regarding the use of pairings in cryptographic protocols. It is also important to optimize these functions and now we will discuss some of them.

Product of Pairings

Often protocols will want to evaluate a product of pairings, rather than a single pairing. This is the case if we’re evaluating multiple signatures, or if the protocol uses cancellation between different equations to ensure security, as in the dual system encoding approach to designing protocols. If each pairing was evaluated individually, this would require multiple evaluations of the final exponentiation. However, we can evaluate the product first, and then evaluate the final exponentiation once. This requires a different interface that can take vectors of points.

Occasionally, there is a sign or an exponent in the factors of the product. It’s very easy to deal with a sign explicitly by negating one of the input points, almost for free. General exponents are more complicated, especially when considering the need for side channel protection. But since we expose the interface, later work on the library will accelerate it without changing applications.

Regarding API exposure, one of the trickiest and most error prone aspects of software engineering is input validation. So we must check that raw binary inputs decode correctly as the points used for a pairing. Part of this verification includes subgroup membership testing which is the topic we discuss next.

Subgroup Membership Testing

Checking that a point is on the curve is easy, but checking that it has the right order is not: the classical way to do this is an entire expensive scalar multiplication. But implementing pairings involves the use of many clever tricks that help to make things run faster.

One example is twisting: the $\mathbb{G}_2$ group are points with coordinates in $\mathbb{F}_{p^{12}}$, however, one can use a smaller field to reduce the number of operations. The trick here is using an associated curve $E’$, which is a twist of the original curve $E$. This allows us to work on the subfield $\mathbb{F}_{p^{2}}$ that has cheaper operations.

Additionally, twisting the curve over $\mathbb{G}_2$ carries some efficiently computable endomorphisms coming from the field structure. For the cost of two field multiplications, we can compute an additional endomorphism, dramatically decreasing the cost of scalar multiplication.

By searching for the smallest combination of scalars that could zero out the $r$-torsion points, Sean Bowe came up with a much more efficient way to do subgroup checks. We implement his trick, with a big reduction in the complexity of some applications.

As can be seen, implementing a pairing is full of subtleties. We just saw that point validation in the pairing setting is a bit more challenging than in the conventional case of elliptic curve cryptography. This kind of reformulation also applies to other operations that require special care on their implementation. One another example is how to encode binary strings as elements of the group $\mathbb{G}_1$ (or $\mathbb{G}_2$). Although this operation might sound simple, implementing it securely needs to take into consideration several aspects; thus we expand more on this topic.

Hash to Curve

An important piece on the Boneh-Franklin Identity-based Encryption scheme is a special hash function that maps an arbitrary string — the identity, e.g., an email address — to a point on an elliptic curve, and that still behaves as a conventional cryptographic hash function (such as SHA-256) that is hard to invert and collision-resistant. This operation is commonly known as hashing to curve.

Boneh and Franklin found a particular way to perform hashing to curve: apply a conventional hash function to the input bitstring, and interpret the result as the $y$-coordinate of a point, then from the curve equation $y^2=x^3+b$, find the $x$-coordinate as $x=\sqrt[3]{y^2-b}$. The cubic root always exists on fields of characteristic $p\equiv 2 \bmod{3}$. But this algorithm does not apply to other fields in general restricting the parameters to be used.

Another popular algorithm, but since now we need to remark it is an insecure way for performing hash to curve is the following. Let the hash of the input be the $x$-coordinate, and from it find the $y$-coordinate by computing a square root $y= \sqrt{x^3+b}$. Note that not all $x$-coordinates lead that the square root exists, which means the algorithm may fail; thus, it’s a probabilistic algorithm. To make sure it works always, a counter can be added to $x$ and increased each time the square root is not found. Although this algorithm always finds a point on the curve, this also makes the algorithm run in variable time i.e., it's a non-constant time algorithm. The lack of this property on implementations of cryptographic algorithms makes them susceptible to timing attacks. The DragonBlood attack is an example of how a non-constant time hashing algorithm resulted in a full key recovery of WPA3 Wi-Fi passwords.

Secure hash to curve algorithms must guarantee several properties. It must be ensured that any input passed to the hash produces a point on the targeted group. That is no special inputs must trigger exceptional cases, and the output point must belong to the correct group. We make emphasis on the correct group since in certain applications the target group is the entire set of points of an elliptic curve, but in other cases, such as in the pairing setting, the target group is a subgroup of the entire curve, recall that $\mathbb{G}_1$ and $\mathbb{G}_2$ are $r$-torsion points. Finally, some cryptographic protocols are proven secure provided that the hash to curve function behaves as a random oracle of points. This requirement adds another level of complexity to the hash to curve function.

Fortunately, several researchers have addressed most of these problems and some other researchers have been involved in efforts to define a concrete specification for secure algorithms for hashing to curves, by extending the sort of geometric trick that worked for the Boneh-Franklin curve. We have participated in the Crypto Forum Research Group (CFRG) at IETF on the work-in-progress Internet draft-irtf-cfrg-hash-to-curve. This document specifies secure algorithms for hashing targeting several elliptic curves including the BLS12-381 curve. At Cloudflare, we are actively collaborating in several working groups of IETF, see Jonathan Hoyland’s post to know more about it.

Our implementation complies with the recommendations given in the hash to curve draft and also includes many implementation techniques and tricks derived from a vast number of academic research articles in pairing-based cryptography. A good compilation of most of these tricks is delightfully explained by Craig Costello.

We hope this post helps you to shed some light and guidance on the development of pairing-based cryptography, as it has become much more relevant these days. We will share with you soon an interesting use case in which the application of pairing-based cryptography helps us to harden the security of our infrastructure.

What’s next?

We invite you to use our CIRCL library, now equipped with bilinear pairings. But there is more: look at other primitives already available such as HPKE, VOPRF, and Post-Quantum algorithms. On our side, we will continue improving the performance and security of our library, and let us know if any of your projects uses CIRCL, we would like to know your use case. Reach us at research.cloudflare.com.

You’ll soon hear more about how we’re using CIRCL across Cloudflare.

Introducing Zero-Knowledge Proofs for Private Web Attestation with Cross/Multi-Vendor Hardware

Watson Ladd — Thu, 12 Aug 2021 13:00:06 GMT

A few weeks ago we introduced Cryptographic Attestation of Personhood to replace CAPTCHAs with USB security keys, and today we announced additional support for on-device biometric hardware. While doing that work, it occurred to us that hardware attestation, proving identity or other properties of a user with a piece of hardware, could have many wider applications beyond just CAPTCHA alternatives and user authentication via WebAuthn. Really, why should someone have to have an account to prove they exist, when their own trusted device can do so?

Attestation in the WebAuthn standard lets websites know that your security key is authentic. It was designed to have good privacy properties baked into policies that must be followed by device manufacturers. The information your security key sends to websites is indistinguishable from that of myriad other keys. Even so, we wanted to do better. If we’re taking attestation out of authentication, then we need to learn only that your security key is authentic — and we’ve designed a new Zero-Knowledge Proof for the browser to do that.

This is part of our work to improve privacy across the Internet. We’ve yet to put this proof of personhood in production, but you can see a demonstration of the technique in action. We’ve seen it work with YubiKeys among others. Most importantly, we’re open-sourcing the code so everyone can benefit and contribute. Read through below for details, as well as next steps.

Introduction

WebAuthn attestation identifies the manufacturer of your hardware security key to the website that wants the attestation. It was intended for deployment in closed settings like financial institutions and internal services where the website already has a preexisting relationship with you. Since logging in identifies you, the privacy impact was minimal. In contrast, any open website that uses attestation, like we do for proof of personhood, learns the make and model of the key you used.

Make and model information doesn’t seem that sensitive, just like the make and model of your car doesn’t seem that sensitive. There are a lot of 2015 Priuses out there, so knowing you drive one doesn’t help identify you. But when paired with information such as user agent, language preferences, time of day, etc., it can contribute to building up a picture of the user — just as demographic details, height, weight and clothing together with the make and model of a car combine to make it easier to pinpoint a particular car on the highway. Therefore, browsers have a dialogue when a website obtains this attestation, to make sure users understand that the website is learning information that may help identify them. We take privacy seriously at Cloudflare, and want to avoid learning any information that could identify you.

An example browser warning.‌‌

The information that we see from attestation is a proof that the manufacturer of your security key really did make that key. It’s a digital signature using a private key held on your security key in a secure enclave, together with a certificate chain that leads to the manufacturer. These chains enable any server to see that the hardware security key is authentic. All we want for the Cryptographic Attestation of Personhood is a single bit: that you own a hardware security key that is trustworthy, and none of the details about the manufacturer or model.

Historically, attestation has been used in environments where only a few manufacturers were considered acceptable. For example, large financial institutions are understandably conservative. In this environment, revealing the manufacturer is necessary. In an open vendor design, we don’t want to privilege any particular manufacturer, but instead just know that the keys are trustworthy.

Trustworthiness is determined by the FIDO MetaData Service). It is a service from the FIDO2 alliance who maintain root certificates for the manufacturers. When these keys are compromised, they are listed as such in the FIDO system. We have automated scripts to download these roots and insert them into releases of our software. This ensures that we are always up-to-date as new manufacturers emerge or older devices are compromised as attackers extract the keys or the keys get mishandled.

To their credit, the FIDO consortium requires that no fewer than 100,000 devices all share an attestation key, setting a lower bound on the device anonymity set size to minimize the impact of information collection. Alas, this hasn’t always happened. Some manufacturers might not have the volume necessary to ever get that batch size, and users shouldn’t have to flock to the biggest ones to have their privacy protected. At Cloudflare, we have a strong privacy policy that governs how we use this information, but we’d prefer not to know your key’s manufacturer at all. Just as we’ve removed cookies that we no longer needed, and log data customers need to debug their firewall rules without us being able to see it ourselves, we’re always looking for ways to reduce the information that we can see.

At the same time, we need to make sure that the device that’s responding to our request is a genuine security key and not some software emulation run by a bot. While we don’t care which key it is, we’d like to know that it actually is a key that meets our security requirements and hasn’t been compromised. In essence, we’d like to prove the legitimacy of the credential without learning anything else about it. This problem of anonymous credentials isn’t new, and lots of solutions have been proposed and some even deployed.

However, these schemes typically require that the hardware or software that implements the credential attestation was designed with the specific scheme in mind. We can’t go out and convince manufacturers to add features in a few months, let alone replace all the hardware authentication security keys in the world. We have to instead search for solutions that work with existing hardware.

A high-level introduction to Zero-Knowledge Proof

At first glance, it seems that we have an impossible task. How can I demonstrate that I know something without telling you what it is? But sometimes this is possible. If I claim to have the key to a mailbox, you can put a letter inside the mailbox, walk away, and ask me to read the letter. If I claim to know your telephone number, you can ask me to call you. Such a proof is known as a zero-knowledge proof, often abbreviated ZKP.

A classic example of a zero-knowledge proof is showing to someone that you know where Waldo is in Where’s Waldo. While you could point to Waldo on the page, this would tell the person exactly where Waldo is. If however you were to cover up the page with a big piece of paper that has a small hole that only shows Waldo, then the person could only see that Waldo was somewhere on the page, and would be unable to figure out where. They would know that you know where Waldo is, but not know where Waldo is themselves.

Caption: Sometimes finding Waldo isn’t the problem (Source: https://commons.wikimedia.org/wiki/File:Where%E2%80%99s_Wally_World_Record_(5846729480).jpg)

Cryptographers have designed numerous zero-knowledge proofs and ways to hook them together. The central piece in hooking them together is a commitment, a cryptographic envelope. A commitment prevents tampering with the value that was placed inside when it was made, and later can be opened to show what was placed in it. We use commitment schemes in real life. A magician might seal a piece of paper in an envelope to assure the audience that he cannot touch or tamper with it, and later have someone open the envelope to reveal his prediction. A silent auction involves people entering sealed envelopes containing bids that are then opened together, making sure that no one can adjust their bid after they see what others have bid.

One very simple cryptographic protocol that makes use of commitments is coin flipping. Suppose Alice and Bob want to flip a coin, but one of them has a few double-headed quarters and the other can flip a coin so it comes up on the side they want every time. The only way to get a fair flip is for both of them to be involved in a way that makes sure if either is honest, the result is a fair flip. But if they simply each flip a coin and trade the results, whoever goes last could pretend they had gotten the result that would make the desired outcome.

Using a commitment scheme solves this problem. Instead of Alice and Bob saying what their results are out loud, they trade commitments to the results. Then they open the commitments. Because they traded the commitments, neither of them can pretend to have gotten a different result based on what they learned, as then they will be detected when they open the commitments.

Commitments are like wires that tie zero-knowledge proofs together, making bigger and more complicated ones from simple ones. By proving that a value in a commitment has two different properties with different zero-knowledge proofs we can prove both properties hold for the value. This lets us link together proofs for statements like “the value in a commitment is a sum of values in two other commitments” and “the value in a commitment appears in a list” into much more complicated and useful statements. Since we know how to prove statements like “this commitment is one if and only if both of these other commitments are one” and “this commitment is one if either of these two commitments is one” we have the building blocks to prove any statement. These generic techniques can produce a zero-knowledge proof for any statement in NP, although it will be quite slow and complicated by default.

Our Zero-Knowledge Proof system for the browser

In Cryptographic Attestation of Personhood the server sends a message to the browser that the hardware security signs, demonstrating its authenticity. Just as a paper signature ensures that the person making it saw it and signed it, a digital signature ensures the identity of the signer. When we use our zero-knowledge proof, instead of sending the signature, the client sends a proof that the signature was generated by a key on a server provided list.

Because we only send the proof to the server, the server learns only that the attestation exists, and not which hardware security key generated it. This guarantees privacy as the identifying information about the security key never leaves the browser. But we need to make sure that proving and verification are efficient enough to carry out at scale, to have a deployable solution.

We investigated many potential schemes, including SNARKS. Unfortunately the code size, toolchain requirements, and proving complexity of a SNARK proved prohibitive. The security of SNARKS relies on more complicated assumptions than the scheme we ultimately went with. Obviously this is an area of active research and the best technology today is not necessarily the best technology of the future.

For the hardware security keys we support, the digital signature in the attestation was produced by the Elliptic Curve Digital Signature Algorithm (ECDSA). ECDSA is itself similar to many of the zero-knowledge proofs we use. It starts with the signer computing a point $R=kG$ on an elliptic curve for a random value $x$. Then the signer takes the $x$ coordinate of the point, which is written as $r$, and their private key, and the hash of the message, and computes a value $s$. The pair $(r, s)$ is the signature. The verifier uses $r$ and $s$ and the public key to recompute $R$, and then checks that the $x$ coordinate matches $r$.

Unfortunately, the verification equation as commonly presented involves some operations that would need to convert values from one form to another. In a zero knowledge proof these operations are complex and expensive, with many steps. We had to sidestep this limitation to make our system work. To transform ECDSA into a scheme we can work with, our prover sends $R$ instead, and commits to a value $z$ computed from $r$ and $s$ that simplifies the verification equation. Anyone can take an ECDSA signature and turn it into a signature for our tweaked scheme and vice versa without using any secret knowledge, so it is just as secure as ECDSA.

Since the statement we want to prove has two parts — “the message was signed by a key” and “the key is on the list” — it is natural to break up the problem of proving that statement into two pieces. First, the prover demonstrates that the key inside of a commitment signed the message, and then the prover demonstrates the committed key is on a list. The verifier likewise checks these two parts and if both parts work, indicates that the proof is valid.

To prove that the signature verifies under a key, we had to use a proof that one elliptic curve point is a known power of another. This proof is a fairly straightforward zero-knowledge proof, but some steps themselves require zero-knowledge protocols for proving that points were added correctly and arithmetic was done correctly. This proof consumes the bulk of the time in proof generation and verification. Once this proof is verified, the verifier knows that the message was signed by the committed public key.

The next step is for the prover to find where their key is on the list, and then prove that their key is in the list. To do this we use the zero-knowledge proof developed by Groth and Kohlweiss. Their proof first commits to the binary expansion of the place of the commitment in the list. The prover then proves that binary expansion is made out of bits, and supplies some extra information about how they proved it. With the extra info and the proofs, both sides can compute polynomials that evaluate to zero if the commitment is to a value on the list. This code is surprisingly short for such a complex task.

By evaluating a polynomial, we show our committed value is a zero.

The verifier then checks the Groth-Kohlweiss proof that the committed key is on the list, and then makes sure the message that was signed is what it should be. This is a very efficient proof, even as the list size grows: the work done per list element is a multiplication. If all matches, then we know that the signature was generated by a sufficiently secure security key, and nothing else. If it does not match we know that something is wrong.

Engineering a more efficient curve

We turned statements about ECDSA signatures into statements about points on the P-256 elliptic curve, and then into statements about arithmetic in the field that P-256 is defined over. These statements are easiest to prove if we have a group with a size matching the size of a field, and so we had to find one. This posed an interesting challenge as it’s the reverse of how we normally do things in cryptography. If you’d like to see how we solved it read on, otherwise skip ahead.

Most of the time in elliptic curve cryptography we start with a convenient base field, and search for elliptic curves of prime or nearly prime order with the right properties for our application. This way we can pick primes with properties convenient for computer hardware. When it comes to wanting pairing friendly curves, we typically do computer searches for curves whose parameters are given by polynomials that are known.

But here we wanted a curve with a given number of points, and so we would have to use some fairly advanced number theoretic machinery to determine this curve. Our doing so was a big part in getting our zero-knowledge attestation as efficient as it is.

Elliptic Curves and the Complex Plane

Elliptic curves are particularly nice over the complex numbers. An elliptic curve is isomorphic to a torus. All complex curves are isomorphic to tori over the complex numbers, but some have more than one hole.

Different elliptic curves are distinguished by how fat or thin the two directions around the torus are with respect to one another. If we imagine slicing around the holes in the torus, we see that we can get a torus from taking a rectangle and gluing up the sides. There is an illustrative video of what this looks like Gluing a torus.

Instead of taking one rectangle and gluing it up, we can imagine taking the entire plane, and then folding it up so that every rectangle lines up. In doing so the corners of these rectangles all line up over the origin. The corners form what we call a lattice, and we can always scale and rotate to have one of the generators of the lattice be 1.

Viewed this way, addition of complex numbers becomes addition on the torus, just as addition of the integers modulo 2 is addition of integers, then reduced mod 2. But with elliptic curves we’re used to having algebraic equations for addition and multiplication, and also for the curves themselves. How does this analytic picture interact with the algebraic one?

Via a great deal of classical complex geometry we find they are closely related. The ring of complex-valued functions on the lattice is generated by the Weierstrass $\mathcal{P}$ function and its derivative. These satisfy an algebraic equation of the form $y^2=x^3+ax+b$, and the parameters $a$ and $b$ are functions of the lattice parameter. The classical formulas for the algebraic addition of points on elliptic curves emerge from this.

One of the parameters is the $j$ invariant, which is a more arithmetically meaningful invariant than $\tau$. The $j$ invariant is an invariant of the elliptic curve: values of $\tau$ that give rise to the same lattice produce the same $j$ invariant, while they may have different $a$ and $b$. There are many expressions for $j$, with one being

https://dlmf.nist.gov/23.15#E7

Complex multiplication and class number

Suppose we take the lattice $\{1, i\}$. If we multiply this lattice by $i$, we get back $\{i, -1\}$, which generates the same set of points. This is exceptional, and can only happen when the number we multiply by satisfies a quadratic equation. The elements of the lattice are then closely related to the solutions of that quadratic equation.

Associated with such a lattice is a discriminant: the discriminant of the quadratic field associated with the example. For our example with i the discriminant is $-4$, the discriminant of the quadratic equation $x^2+1$. If, for instance, we were to take $\sqrt{-5}$ instead and consider the lattice $ \{1, \sqrt{-5}\} $, the discriminant would be $-20$, the discriminant of $x^2-5$. Note that there are different definitions of the discriminant, which change the sign and add various powers of $2$.

Maps from elliptic curves to themselves are called endomorphisms. Most elliptic curves just have multiplication by integers as endomorphisms. But some curves have additional endomorphisms. For instance, if we turn the lattice $\{1, i\}$ into an elliptic curve, we obtain $y^2=x^3+x$. Now this curve has an extra endomorphism: if I send $y$ to $iy$ and $x$ to $-x$, I get a point that satisfies the curve equation as $(-iy)^2=(-x)^3-x$. Doing this map twice produces the same effect as inverting a point, and it’s no coincidence that multiplying twice by $i$ sends a complex number $z$ to $-z$. So this extra endomorphism and multiplying by $i$ satisfy the same equation. Having an extra endomorphism is called complex multiplication as its multiplication by a complex number. When the lattice an elliptic curve comes from has complex multiplication, the elliptic curve also has complex multiplication and vice versa.

Any set of mathematical objects comes with questions, and elliptic curves with complex multiplication are no exception. We can ask how many elliptic curves with complex multiplication there are for a given discriminant. How does that number grow as the discriminant grows? Some of these questions are still open today, despite years of research and computer experimentation. Key to approaching them is a link between lattices and arithmetic.

Earlier in the 19th century Gauss studied binary quadratic forms, equations of the form $ax^2+bxy+cy^2$. Such forms are said to be equivalent if there is a substitution with integer coefficients for $x$ and $y$ that takes one into the other. This is a core notion, and in addition to algorithms for reducing a binary quadratic form, Gauss demonstrated that there was a composition law that made binary quadratic forms of a given discriminant into a group.

Later number theorists would develop the concept of an ideal, tying quadratic forms to the failure of factorization to be unique. $x^2+5y^2$and $2x^2+2xy+3y^2$ are both quadratic forms of discriminant -20, and this is connected to the failure of unique factorization in $\mathbb{Z}[\sqrt{-5}]$.

When we consider binary quadratic forms as the lengths of vectors in a plane, each lattice gives a binary quadratic form up to equivalence. The elliptic curves with complex multiplication by a given discriminant thus correspond to the classes of binary quadratic forms with a given discriminant, which connects to the arithmetic in the quadratic field. Three very different looking questions thus all have the same answer.

Why complex multiplication matters for finding curves

When we take an elliptic curve with integral coefficients and consider it over a prime, it gets an extra endomorphism: the Frobenius endomorphism that sends $x\rightarrow x^p$ and $ny\rightarrow y^p$ . This endomorphism satisfies a quadratic equation, and the linear term of that quadratic equation is the number of points minus $p+1$.

If the elliptic curve has complex multiplication, there is another endomorphism, namely the one we get from complex multiplication. But an elliptic curve can only have one extra endomorphism unless it is supersingular. Supersingular curves are very rare. So the Frobenius endomorphism and the endomorphism from complex multiplication must be the same. Because we started out with complex multiplication, we know the quadratic equation the Frobenius must satisfy, and hence the number of points.

This is the conclusion of our saga: to find an elliptic curve with a given order, find integer solutions to the equation $t^2+Dy^2=4N$ for small $D$ and let $p=N-t+1$ and see if it is prime. Then factor the Hilbert class polynomial of $D$ over $p$, and take one of the roots modulo $p$ as the $j$ invariant of our elliptic curve. We may need to take a quadratic twist to get the right number of points, since t is only identified up to sign.

This gave us the curve we needed for efficient proving of relations over the base field of P-256. All of this mathematics produces a script that runs in a few minutes and produces the curve with the desired order that we needed.

The results of our labor

After all this work, and much additional engineering work needed to make the proof run faster through optimizing little bits of it, we can generate a proof in a few seconds, and verify it in a few hundred milliseconds. This is fast enough to be practical, and means that websites that want to verify the security of security keys can do so without negative privacy impacts.

All a website that uses our technique learns is whether or not the signature was generated by a token whose attestation key is on the list they provided. Unlike using WebAuthn directly they do not get any more detailed information, even if the manufacturer has accidentally made the batch too small. Instead of having a policy-based approach to guarding user privacy, we’ve removed the troublesome information.

Next steps — a community effort!

Our demonstration shows that we have a working privacy enhancement based on a zero-knowledge proof. We’re continuing to improve it, adding more performance and security features. But our task isn’t done until we have a privacy-preserving WebAuthn extension that works in every browser, giving users assurance that their data is not leaving the device.

What we have now is a demonstration of what is possible: using zero-knowledge proofs to turn WebAuthn attestation into a system that treats every manufacturer equally by design, protects user privacy, and can be used by every website. The challenges around user privacy that are created by using attestation on a wide scale are solvable.

There’s much more that goes into a high-quality, reliable system than the core cryptographic insights. In order to make the user experience not involve warnings about the information our zero-knowledge proof discards, we need more integration with the browser. We also need a safe way for users of devices not on our list to send their key to us and demonstrate that it should be trusted, and a way to make sure that the list isn’t being abused to try to pinpoint particular keys.

In addition, this verification is more heavyweight than the older verification methods, so servers that implement it need to incorporate rate limiting and other protections against abuse. SNARKS would be a big advantage here, but comes at a cost of code size for the demonstration. Ultimately bringing these improvements into a core part of the web ecosystem requires working with users, browsers, and other participants to find a solution that works for them. We would like to hear from you at ask-research@cloudflare.com if you would like to contribute to the process.

Our Cryptographic Attestation of Personhood gives users an easier way to demonstrate their humanity, and one which is more privacy-preserving than many CAPTCHA alternatives or providers. But we weren’t satisfied with the state of the art and saw a way to apply advanced cryptography techniques to improve the privacy of our users. Our work shows zero-knowledge proofs can enhance the privacy offered by real world protocols.

NTS is now an RFC

Watson Ladd — Thu, 01 Oct 2020 14:53:04 GMT

Earlier today the document describing Network Time Security for NTP officially became RFC 8915. This means that Network Time Security (NTS) is officially part of the collection of protocols that makes the Internet work. We’ve changed our time service to use the officially assigned port of 4460 for NTS key exchange, so you can use our service with ease. This is big progress towards securing a ubiquitous Internet protocol.

Over the past months we’ve seen many users of our time service, but very few using Network Time Security. This leaves computers vulnerable to attacks that imitate the server they use to obtain NTP. Part of the problem was the lack of available NTP daemons that supported NTS. That problem is now solved: chrony and ntpsec both support NTS.

Time underlies the security of many of the protocols such as TLS that we rely on to secure our online lives. Without accurate time, there is no way to determine whether or not credentials have expired. The absence of an easily deployed secure time protocol has been a problem for Internet security.

Without NTS or symmetric key authentication there is no guarantee that your computer is actually talking NTP with the computer you think it is. Symmetric key authentication is difficult and painful to set up, but until recently has been the only secure and standardized mechanism for authenticating NTP. NTS uses the work that goes into the Web Public Key Infrastructure to authenticate NTP servers and ensure that when you set up your computer to talk to time.cloudflare.com, that’s the server your computer gets the time from.

Our involvement in developing and promoting NTS included making a specialized server and releasing the source code, participation in the standardization process, and much working with implementers to hunt down bugs. We also set up our time service with support for NTS from the beginning, and it was a useful resource for implementers to test interoperability.

NTS operation diagram

When Cloudflare supported TLS 1.3 browsers were actively updating, and so deployment quickly took hold. However, the long tail of legacy installs and extended support releases slowed adoption. Similarly until Let’s Encrypt made encryption easy for webservers most web traffic was not encrypted.

By contrast ssh quickly displaced telnet as the way to access remote systems: the security benefits were substantial, and the experience was better. Adoption of protocols is slow, but when there is a real security need it can be much faster. NTS is a real security improvement that is vital to adopt. We’re proud to continue making the Internet a better place by supporting secure protocols.

We hope that operating systems will incorporate NTS support and TLS 1.3 in their supplied NTP daemons. We also urge administrators to deploy NTS as quickly as possible, and NTP server operators to adopt NTS. With Let’s Encrypt provided certificates this is simpler than it has been in the past

We’re continuing our work in this area with the continued development of the Roughtime protocol for even better security as well as engagement with the standardization process to help develop the future of Internet time.

Cloudflare is willing to allow any device to point to time.cloudflare.com and supports NTS. Just as our Universal SSL made it easy for any website to get the security benefits of TLS, our time service makes it easy for any computer to get the benefits of secure time.

Internship Experience: Cryptography Engineer

Watson Ladd — Thu, 09 Apr 2020 11:00:00 GMT

Back in the summer of 2017 I was an intern at Cloudflare. During the scholastic year I was a graduate student working on automorphic forms and computational Langlands at Berkeley: a part of number theory with deep connections to representation theory, aimed at uncovering some of the deepest facts about number fields. I had also gotten involved in Internet standardization and security research, but much more on the applied side.

While I had published papers in computer security and had coded for my dissertation, building and deploying new protocols to production systems was going to be new. Going from the academic environment of little day to day supervision to the industrial one of more direction; from greenfield code that would only ever be run by one person to large projects that had to be understandable by a team; from goals measured in years or even decades, to goals measured in days, weeks, or quarters; these transitions would present some challenges.

Cloudflare at that stage was a very different company from what it is now. Entire products and offices simply did not exist. Argo, now a mainstay of our offering for sophisticated companies, was slowly emerging. Access, which has been helping safeguard employees working from home these past weeks, was then experiencing teething issues. Workers was being extensively developed for launch that autumn. Quicksilver was still in the slow stages of replacing KyotoTycoon. Lisbon wasn’t on the map, and Austin was very new.

Day 1

My first job was to get my laptop working. Quickly I discovered that despite the promise of using either Mac or Linux, only Mac was supported as a local development environment. Most Linux users would take a good part of a month to tweak all the settings and get the local development environment up. I didn’t have months. After three days, I broke down and got a Mac.

Needless to say I asked for some help. Like a drowning man in quicksand, I managed to attract three engineers to this near insoluble problem of the edge dev stack, and after days of hacking on it, fixing problems that had long been ignored, we got it working well enough to test a few things. That development environment is now gone and replaced with one built Kubernetes VMs, and works much better that way. When things work on your machine, you can now send everyone your machine.

Speeding up

With setup complete enough, it was on to the problem we needed to solve. Our goal was to implement a set of three interrelated Internet drafts, one defining secondary certificates, one defining external authentication with TLS certificates, and a third permitting servers to advertise the websites they could serve.

External authentication is a TLS feature that permits a server or a client on an already opened connection to prove its possession of the private key of another certificate. This proof of possession is tied to the TLS connection, avoiding attacks on bearer tokens caused by the lack of this binding.

Secondary certificates is an HTTP/2 feature enabling clients and servers to send certificates together with proof that they actually know the private key. This feature has many applications such as certificate-based authentication, but also enables us to prove that we are permitted to serve the websites we claim to serve.

The last draft was the HTTP/2 ORIGIN frame. The ORIGIN frame enables a website to advertise other sites that it could serve, permitting more connection reuse than allowed under the traditional rules. Connection reuse is an important part of browser performance as it avoids much of the setup of a connection.

These drafts solved an important problem for Cloudflare. Many resources such as JavaScript, CSS, and images hosted by one website can be used by others. Because Cloudflare proxies so many different websites, our servers have often cached these resources as well. Browsers though, do not know that these different websites are made faster by Cloudflare, and as a result they repeat all the steps to request the subresources again. This takes unnecessary time since there is an established and usable perfectly good connection already. If the browser could know this, it could use the connection again.

We could only solve this problem by getting browsers and the broader community of TLS implementers on board. Some of these drafts such as external authentication and secondary certificates had a broader set of motivations, such as getting certificate based authentication to work with HTTP/2 and TLS 1.3. All of these needs had to be addressed in the drafts, even if we were only implementing a subset of the uses.

Successful standards cover the use cases that are needed while being simple enough to implement and achieve interoperability. Implementation experience is essential to achieving this success: a standard with no implementations fails to incorporate hard won lessons. Computers are hard.

Prototype

My first goal was to set up a simple prototype to test the much more complex production implementation, as well as to share outside of Cloudflare so that others could have confidence in their implementations. But these drafts that had to be implemented in the prototype were incremental improvements to an already massive stack of TLS and HTTP standards.

I decided it would be easiest to build on top of an already existing implementation of TLS and HTTP. I picked the Go standard library as my base: it’s simple, readable, and in a language I was already familiar with. There was already a basic demo showcasing support in Firefox for the ORIGIN frame, and it would be up to me to extend it.

Using that as my starting point I was able in 3 weeks to set up a demonstration server and a client. This showed good progress, and that nothing in the specification was blocking implementation. But without integrating it into our servers for further experimentation so that we might discover rare issues that could be showstoppers. This was a bitter lesson learned from TLS 1.3, where it took months to track down a single brand of printer that was incompatible with the standard, and forced a change.

From Prototype to Production

We also wanted to understand the benefits with some real world data, to convince others that this approach was worthwhile. Our position as a provider to many websites globally gives us diverse, real world data on performance that we use to make our products better, and perhaps more important, to learn lessons that help everyone make the Internet better. As a result we had to implement this in production: the experimental framework for TLS 1.3 development had been removed and we didn’t have an environment for experimentation.

At the time everything at Cloudflare was based on variants of NGINX. We had extended it with modules to implement features like Keyless and customized certificate handling to meet our needs, but much of the business logic was and is carried out in Lua via OpenResty.

Lua has many virtues, but at the time both the TLS termination and the core business logic lived in the same repo despite being different processes at runtime. This made it very difficult to understand what code was running when, and changes to basic libraries could create problems for both. The build system for this creation had the significant disadvantage of building the same targets with different settings. Lua also is a very dynamic language, but unlike the dynamic languages I was used to, there was no way to interact with the system as it was running on requests.

The first step was implementing the ORIGIN frame. In implementing this, we had to figure out which sites hosted the subresources used by the page we were serving. Luckily, we already had this logic to enable server push support driven by Link headers. Building on this let me quickly get ORIGIN working.

This work wasn’t the only thing I was up to as an intern. I was also participating in weekly team meetings, attending our engineering presentations, and getting a sense of what life was like at Cloudflare. We had an excursion for interns to the Computer History Museum in Mountain View and Moffett Field, where we saw the base museum.

The next challenge was getting the CERTIFICATE frame to work. This was a much deeper problem. NGINX processes a request in phases, and some of the phases, like the header processing phase, do not permit network I/O without locking up the event loop. Since we are parsing the headers to determine what to send, the frame is created in the header processing phase. But finding a certificate and telling Keyless to sign it required network I/O.

The standard solution to this problem is to have Lua execute a timer callback, in which network I/O is possible. But this context doesn’t have any data from the request: some serious refactoring was needed to create a way to get the keyless module to function outside the context of a request.

Once the signature was created, the battle was half over. Formatting the CERTIFICATE frame was simple, but it had to be stuck into the connection associated with the request that had demanded it be created. And there was no reason to expect the request was still alive, and no way to know what state it was in when the request was handled by the Keyless module.

To handle this issue I made a shared btree indexed by a number containing space for the data to be passed back and forth. This enabled the request to record that it was ready to send the CERTIFICATE frame and Keyless to record that it was ready with a frame to send. Whichever of these happened second would do the work to enqueue the frame to send out.

This was not an easy solution: the Keyless module had been written years before and largely unmodified. It fundamentally assumed it could access data from the request, and changing this assumption opened the door to difficult to diagnose bugs. It integrates into BoringSSL callbacks through some pretty tricky mechanisms.

However, I was able to test it using the client from the prototype and it worked. Unfortunately when I pushed the commit in which it worked upstream, the CI system could not find the git repo where the client prototype was due to a setting I forgot to change. The CI system unfortunately didn’t associate this failure with the branch, but attempted to check it out whenever it checked out any other branch people were working on. Murphy ensured my accomplishment had happened on a Friday afternoon Pacific time, and the team that manages the SSL server was then exclusively in London…

Monday morning the issue was quickly fixed, and whatever tempers had frayed were smoothed over when we discovered the deficiency in the CI system that had enabled a single branch to break every build. It’s always tricky to work in a global team. Later Alessandro flew to San Francisco for a number of projects with the team here and we worked side by side trying to get a demonstration working on a test site. Unfortunately there was some difficulty tracking down a bug that prevented it working in production. We had run out of time, and my internship was over. Alessandro flew back to London, and I flew to Idaho to see the eclipse.

The End

Ultimately we weren’t able to integrate this feature into the software at our edge: the risks of such intrusive changes for a very experimental feature outweighed the benefits. With not much prospect of support by clients, it would be difficult to get the real savings in performance promised. There also were nontechnical issues in standardization that have made this approach more difficult to implement: any form of traffic direction that doesn’t obey DNS creates issues for network debugging, and there were concerns about the impact of certificate misissuance.

While the project was less successful than I hoped it would be, I learned a lot of important skills: collaborating on large software projects, working with git, and communicating with other implementers about issues we found. I also got a taste of what it would be like to be on the Research team at Cloudflare and turning research from idea into practical reality and this ultimately confirmed my choice to go into industrial research.

I’ve now returned to Cloudflare full-time, working on extensions for TLS as well as time synchronization. These drafts have continued to progress through the standardization process, and we’ve contributed some of the code I wrote as a starting point for other implementers to use. If we knew all our projects would work out, they wouldn’t be ambitious enough to be research worth doing.

If this sort of research experience appeals to you, we’re hiring.

Delegated Credentials for TLS

Nick Sullivan — Fri, 01 Nov 2019 13:00:00 GMT

Today we’re happy to announce support for a new cryptographic protocol that helps make it possible to deploy encrypted services in a global network while still maintaining fast performance and tight control of private keys: Delegated Credentials for TLS. We have been working with partners from Facebook, Mozilla, and the broader IETF community to define this emerging standard. We’re excited to share the gory details today in this blog post.

Also, be sure to check out the blog posts on the topic by our friends at Facebook and Mozilla!

Deploying TLS globally

Many of the technical problems we face at Cloudflare are widely shared problems across the Internet industry. As gratifying as it can be to solve a problem for ourselves and our customers, it can be even more gratifying to solve a problem for the entire Internet. For the past three years, we have been working with peers in the industry to solve a specific shared problem in the TLS infrastructure space: How do you terminate TLS connections while storing keys remotely and maintaining performance and availability? Today we’re announcing that Cloudflare now supports Delegated Credentials, the result of this work.

Cloudflare’s TLS/SSL features are among the top reasons customers use our service. Configuring TLS is hard to do without internal expertise. By automating TLS, web site and web service operators gain the latest TLS features and the most secure configurations by default. It also reduces the risk of outages or bad press due to misconfigured or insecure encryption settings. Customers also gain early access to unique features like TLS 1.3, post-quantum cryptography, and OCSP stapling as they become available.

Unfortunately, for web services to authorize a service to terminate TLS for them, they have to trust the service with their private keys, which demands a high level of trust. For services with a global footprint, there is an additional level of nuance. They may operate multiple data centers located in places with varying levels of physical security, and each of these needs to be trusted to terminate TLS.

To tackle these problems of trust, Cloudflare has invested in two technologies: Keyless SSL, which allows customers to use Cloudflare without sharing their private key with Cloudflare; and Geo Key Manager, which allows customers to choose the geographical locations in which Cloudflare should keep their keys. Both of these technologies are able to be deployed without any changes to browsers or other clients. They also come with some downsides in the form of availability and performance degradation.

Keyless SSL introduces extra latency at the start of a connection. In order for a server without access to a private key to establish a connection with a client, that servers needs to reach out to a key server, or a remote point of presence, and ask them to do a private key operation. This not only adds additional latency to the connection, causing the content to load slower, but it also introduces some troublesome operational constraints on the customer. Specifically, the server with access to the key needs to be highly available or the connection can fail. Sites often use Cloudflare to improve their site’s availability, so having to run a high-availability key server is an unwelcome requirement.

Turning a pull into a push

The reason services like Keyless SSL that rely on remote keys are so brittle is their architecture: they are pull-based rather than push-based. Every time a client attempts a handshake with a server that doesn’t have the key, it needs to pull the authorization from the key server. An alternative way to build this sort of system is to periodically push a short-lived authorization key to the server and use that for handshakes. Switching from a pull-based model to a push-based model eliminates the additional latency, but it comes with additional requirements, including the need to change the client.

Enter the new TLS feature of Delegated Credentials (DCs). A delegated credential is a short-lasting key that the certificate’s owner has delegated for use in TLS. They work like a power of attorney: your server authorizes our server to terminate TLS for a limited time. When a browser that supports this protocol connects to our edge servers we can show it this “power of attorney”, instead of needing to reach back to a customer’s server to get it to authorize the TLS connection. This reduces latency and improves performance and reliability.

The pull model

The push model

A fresh delegated credential can be created and pushed out to TLS servers long before the previous credential expires. Momentary blips in availability will not lead to broken handshakes for clients that support delegated credentials. Furthermore, a Delegated Credentials-enabled TLS connection is just as fast as a standard TLS connection: there’s no need to connect to the key server for every handshake. This removes the main drawback of Keyless SSL for DC-enabled clients.

Delegated credentials are intended to be an Internet Standard RFC that anyone can implement and use, not a replacement for Keyless SSL. Since browsers will need to be updated to support the standard, proprietary mechanisms like Keyless SSL and Geo Key Manager will continue to be useful. Delegated credentials aren’t just useful in our context, which is why we’ve developed it openly and with contributions from across industry and academia. Facebook has integrated them into their own TLS implementation, and you can read more about how they view the security benefits here. When it comes to improving the security of the Internet, we’re all on the same team.

"We believe delegated credentials provide an effective way to boost security by reducing certificate lifetimes without sacrificing reliability. This will soon become an Internet standard and we hope others in the industry adopt delegated credentials to help make the Internet ecosystem more secure."

— Subodh Iyengar, software engineer at Facebook

Extensibility beyond the PKI

At Cloudflare, we’re interested in pushing the state of the art forward by experimenting with new algorithms. In TLS, there are three main areas of experimentation: ciphers, key exchange algorithms, and authentication algorithms. Ciphers and key exchange algorithms are only dependent on two parties: the client and the server. This freedom allows us to deploy exciting new choices like ChaCha20-Poly1305 or post-quantum key agreement in lockstep with browsers. On the other hand, the authentication algorithms used in TLS are dependent on certificates, which introduces certificate authorities and the entire public key infrastructure into the mix.

Unfortunately, the public key infrastructure is very conservative in its choice of algorithms, making it harder to adopt newer cryptography for authentication algorithms in TLS. For instance, EdDSA, a highly-regarded signature scheme, is not supported by certificate authorities, and root programs limit the certificates that will be signed. With the emergence of quantum computing, experimenting with new algorithms is essential to determine which solutions are deployable and functional on the Internet.

Since delegated credentials introduce the ability to use new authentication key types without requiring changes to certificates themselves, this opens up a new area of experimentation. Delegated credentials can be used to provide a level of flexibility in the transition to post-quantum cryptography, by enabling new algorithms and modes of operation to coexist with the existing PKI infrastructure. It also enables tiny victories, like the ability to use smaller, faster Ed25519 signatures in TLS.

Inside DCs

A delegated credential contains a public key and an expiry time. This bundle is then signed by a certificate along with the certificate itself, binding the delegated credential to the certificate for which it is acting as “power of attorney”. A supporting client indicates its support for delegated credentials by including an extension in its Client Hello.

A server that supports delegated credentials composes the TLS Certificate Verify and Certificate messages as usual, but instead of signing with the certificate’s private key, it includes the certificate along with the DC, and signs with the DC’s private key. Therefore, the private key of the certificate only needs to be used for the signing of the DC.

Certificates used for signing delegated credentials require a special X.509 certificate extension (currently only available at DigiCert). This requirement exists to avoid breaking assumptions people may have about the impact of temporary access to their keys on security, particularly in cases involving HSMs and the still unfixed Bleichenbacher oracles in older TLS versions. Temporary access to a key can enable signing lots of delegated credentials which start far in the future, and as a result support was made opt-in. Early versions of QUIC had similar issues, and ended up adopting TLS to fix them. Protocol evolution on the Internet requires working well with already existing protocols and their flaws.

Delegated Credentials at Cloudflare and Beyond

Currently we use delegated credentials as a performance optimization for Geo Key Manager and Keyless SSL. Customers can update their certificates to include the special extension for delegated credentials, and we will automatically create delegated credentials and distribute them to the edge through the Keyless SSL or Geo Key Manager. For more information, see the documentation. It also enables us to be more conservative about where we keep keys for customers, improving our security posture.

Delegated Credentials would be useless if it wasn’t also supported by browsers and other HTTP clients. Christopher Patton, a former intern at Cloudflare, implemented support in Firefox and its underlying NSS security library. This feature is now in the Nightly versions of Firefox. You can turn it on by activating the configuration option security.tls.enable_delegated_credentials at about:config. Studies are ongoing on how effective this will be in a wider deployment. There also is support for Delegated Credentials in BoringSSL.

"At Mozilla we welcome ideas that help to make the Web PKI more robust. The Delegated Credentials feature can help to provide secure and performant TLS connections for our users, and we're happy to work with Cloudflare to help validate this feature."

— Thyla van der Merwe, Cryptography Engineering Manager at Mozilla

One open issue is the question of client clock accuracy. Until we have a wide-scale study we won’t know how many connections using delegated credentials will break because of the 24 hour time limit that is imposed. Some clients, in particular mobile clients, may have inaccurately set clocks, the root cause of one third of all certificate errors in Chrome. Part of the way that we’re aiming to solve this problem is through standardizing and improving Roughtime, so web browsers and other services that need to validate certificates can do so independent of the client clock.

Cloudflare’s global scale means that we see connections from every corner of the world, and from many different kinds of connection and device. That reach enables us to find rare problems with the deployability of protocols. For example, our early deployment helped inform the development of the TLS 1.3 standard. As we enable developing protocols like delegated credentials, we learn about obstacles that inform and affect their future development.

Conclusion

As new protocols emerge, we'll continue to play a role in their development and bring their benefits to our customers. Today’s announcement of a technology that overcomes some limitations of Keyless SSL is just one example of how Cloudflare takes part in improving the Internet not just for our customers, but for everyone. During the standardization process of turning the draft into an RFC, we’ll continue to maintain our implementation and come up with new ways to apply delegated credentials.

Announcing cfnts: Cloudflare's implementation of NTS in Rust

Watson Ladd — Thu, 31 Oct 2019 13:00:00 GMT

Several months ago we announced that we were providing a new public time service. Part of what we were providing was the first major deployment of the new Network Time Security (NTS) protocol, with a newly written implementation of NTS in Rust. In the process, we received helpful advice from the NTP community, especially from the NTPSec and Chrony projects. We’ve also participated in several interoperability events. Now we are returning something to the community: Our implementation, cfnts, is now open source and we welcome your pull requests and issues.

The journey from a blank source file to a working, deployed service was a lengthy one, and it involved many people across multiple teams.

"Correct time is a necessity for most security protocols in use on the Internet. Despite this, secure time transfer over the Internet has previously required complicated configuration on a case by case basis. With the introduction of NTS, secure time synchronization will finally be available for everyone. It is a small, but important, step towards increasing security in all systems that depend on accurate time. I am happy that Cloudflare are sharing their NTS implementation. A diversity of software with NTS support is important for quick adoption of the new protocol."

— Marcus Dansarie, coauthor of the NTS specification

How NTS works

NTS is structured as a suite of two sub-protocols as shown in the figure below. The first is the Network Time Security Key Exchange (NTS-KE), which is always conducted over Transport Layer Security (TLS) and handles the creation of key material and parameter negotiation for the second protocol. The second is NTPv4, the current version of the NTP protocol, which allows the client to synchronize their time from the remote server.

In order to maintain the scalability of NTPv4, it was important that the server not maintain per-client state. A very small server can serve millions of NTP clients. Maintaining this property while providing security is achieved with cookies that the server provides to the client that contain the server state.

In the first stage, the client sends a request to the NTS-KE server and gets a response via TLS. This exchange carries out a number of functions:

Negotiates the AEAD algorithm to be used in the second stage.
Negotiates the second protocol. Currently, the standard only defines how NTS works with NTPv4.
Negotiates the NTP server IP address and port.
Creates cookies for use in the second stage.
Creates two symmetric keys (C2S and S2C) from the TLS session via exporters.

In the second stage, the client securely synchronizes the clock with the negotiated NTP server. To synchronize securely, the client sends NTPv4 packets with four special extensions:

Unique Identifier Extension contains a random nonce used to prevent replay attacks.
NTS Cookie Extension contains one of the cookies that the client stores. Since currently only the client remembers the two AEAD keys (C2S and S2C), the server needs to use the cookie from this extension to extract the keys. Each cookie contains the keys encrypted under a secret key the server has.
NTS Cookie Placeholder Extension is a signal from the client to request additional cookies from the server. This extension is needed to make sure that the response is not much longer than the request to prevent amplification attacks.
NTS Authenticator and Encrypted Extension Fields Extension contains a ciphertext from the AEAD algorithm with C2S as a key and with the NTP header, timestamps, and all the previously mentioned extensions as associated data. Other possible extensions can be included as encrypted data within this field. Without this extension, the timestamp can be spoofed.

After getting a request, the server sends a response back to the client echoing the Unique Identifier Extension to prevent replay attacks, the NTS Cookie Extension to provide the client with more cookies, and the NTS Authenticator and Encrypted Extension Fields Extension with an AEAD ciphertext with S2C as a key. But in the server response, instead of sending the NTS Cookie Extension in plaintext, it needs to be encrypted with the AEAD to provide unlinkability of the NTP requests.

The second handshake can be repeated many times without going back to the first stage since each request and response gives the client a new cookie. The expensive public key operations in TLS are thus amortized over a large number of requests. Furthermore, specialized timekeeping devices like FPGA implementations only need to implement a few symmetric cryptographic functions and can delegate the complex TLS stack to a different device.

Why Rust?

While many of our services are written in Go, and we have considerable experience on the Crypto team with Go, a garbage collection pause in the middle of responding to an NTP packet would negatively impact accuracy. We picked Rust because of its zero-overhead and useful features.

Memory safety After Heartbleed, Cloudbleed, and the steady drip of vulnerabilities caused by C’s lack of memory safety, it’s clear that C is not a good choice for new software dealing with untrusted inputs. The obvious solution for memory safety is to use garbage collection, but garbage collection has a substantial runtime overhead, while Rust has less runtime overhead.
Non-nullability Null pointers are an edge case that is frequently not handled properly. Rust explicitly marks optionality, so all references in Rust can be safely dereferenced. The type system ensures that option types are properly handled.
Thread safety Data-race prevention is another key feature of Rust. Rust’s ownership model ensures that all cross-thread accesses are synchronized by default. While not a panacea, this eliminates a major class of bugs.
Immutability Separating types into mutable and immutable is very important for reducing bugs. For example, in Java, when you pass an object into a function as a parameter, after the function is finished, you will never know whether the object has been mutated or not. Rust allows you to pass the object reference into the function and still be assured that the object is not mutated.
Error handling Rust result types help with ensuring that operations that can produce errors are identified and a choice made about the error, even if that choice is passing it on.

While Rust provides safety with zero overhead, coding in Rust involves understanding linear types and for us a new language. In this case the importance of security and performance meant we chose Rust over a potentially easier task in Go.

Dependencies we use

Because of our scale and for DDoS protection we needed a highly scalable server. For UDP protocols without the concept of a connection, the server can respond to one packet at a time easily, but for TCP this is more complex. Originally we thought about using Tokio. However, at the time Tokio suffered from scheduler problems that had caused other teams some issues. As a result we decided to use Mio directly, basing our work on the examples in Rustls.

We decided to use Rustls over OpenSSL or BoringSSL because of the crate's consistent error codes and default support for authentication that is difficult to disable accidentally. While there are some features that are not yet supported, it got the job done for our service.

Other engineering choices

More important than our choice of programming language was our implementation strategy. A working, fully featured NTP implementation is a complicated program involving a phase-locked loop. These have a difficult reputation due to their nonlinear nature, beyond the usual complexities of closed loop control. The response of a phase lock loop to a disturbance can be estimated if the loop is locked and the disturbance small. However, lock acquisition, large disturbances, and the necessary filtering in NTP are all hard to analyze mathematically since they are not captured in the linear models applied for small scale analysis. While NTP works with the total phase, unlike the phase-locked loops of electrical engineering, there are still nonlinear elements. For NTP testing, changes to this loop requires weeks of operation to determine the performance as the loop responds very slowly.

Computer clocks are generally accurate over short periods, while networks are plagued with inconsistent delays. This demands a slow response. Changes we make to our service have taken hours to have an effect, as the clients slowly adapt to the new conditions. While RFC 5905 provides lots of details on an algorithm to adjust the clock, later implementations such as chrony have improved upon the algorithm through much more sophisticated nonlinear filters.

Rather than implement these more sophisticated algorithms, we let chrony adjust the clock of our servers, and copy the state variables in the header from chrony and adjust the dispersion and root delay according to the formulas given in the RFC. This strategy let us focus on the new protocols.

Prague

Part of what the Internet Engineering Task Force (IETF) does is organize events like hackathons where implementers of a new standard can get together and try to make their stuff work with one another. This exposes bugs and infelicities of language in the standard and the implementations. We attended the IETF 104 hackathon to develop our server and make it work with other implementations. The NTP working group members were extremely generous with their time, and during the process we uncovered a few issues relating to the exact way one has to handle ALPN with older OpenSSL versions.

At the IETF 104 in Prague we had a working client and server for NTS-KE by the end of the hackathon. This was a good amount of progress considering we started with nothing. However, without implementing NTP we didn’t actually know that our server and client were computing the right thing. That would have to wait for later rounds of testing.

Wireshark during some NTS debugging

Crypto Week

As Crypto Week 2019 approached we were busily writing code. All of the NTP protocol had to be implemented, together with the connection between the NTP and NTS-KE parts of the server. We also had to deploy processes to synchronize the ticket encrypting keys around the world and work on reconfiguring our own timing infrastructure to support this new service.

With a few weeks to go we had a working implementation, but we needed servers and clients out there to test with. But because we only support TLS 1.3 on the server, which had only just entered into OpenSSL, there were some compatibility problems.

We ended up compiling a chrony branch with NTS support and NTPsec ourselves and testing against time.cloudflare.com. We also tested our client against test servers set up by the chrony and NTPsec projects, in the hopes that this would expose bugs and have our implementations work nicely together. After a few lengthy days of debugging, we found out that our nonce length wasn’t exactly in accordance with the spec, which was quickly fixed. The NTPsec project was extremely helpful in this effort. Of course, this was the day that our office had a blackout, so the testing happened outside in Yerba Buena Gardens.

Yerba Buena commons. Taken by Wikipedia user Beyond My Ken. CC-BY-SA

During the deployment of time.cloudflare.com, we had to open up our firewall to incoming NTP packets. Since the start of Cloudflare’s network existence and because of NTP reflection attacks we had previously closed UDP port 123 on the router. Since source port 123 is also used by clients sometimes to send NTP packets, it’s impossible for NTP servers to filter reflection attacks without parsing the contents of NTP packet, which routers have difficulty doing. In order to protect Cloudflare infrastructure we got an entire subnet just for the time service, so it could be aggressively throttled and rerouted in case of massive DDoS attacks. This is an exceptional case: most edge services at Cloudflare run on every available IP.

Bug fixes

Shortly after the public launch, we discovered that older Windows versions shipped with NTP version 3, and our server only spoke version 4. This was easy to fix since the timestamps have not moved in NTP versions: we echo the version back and most still existing NTP version 3 clients will understand what we meant.

Also tricky was the failure of Network Time Foundation ntpd clients to expand the polling interval. It turns out that one has to echo back the client’s polling interval to have the polling interval expand. Chrony does not use the polling interval from the server, and so was not affected by this incompatibility.

Both of these issues were fixed in ways suggested by other NTP implementers who had run into these problems themselves. We thank Miroslav Lichter tremendously for telling us exactly what the problem was, and the members of the Cloudflare community who posted packet captures demonstrating these issues.

Continued improvement

The original production version of cfnts was not particularly object oriented and several contributors were just learning Rust. As a result there was quite a bit of unwrap and unnecessary mutability flying around. Much of the code was in functions even when it could profitably be attached to structures. All of this had to be restructured. Keep in mind that some of the best code running in the real-world have been written, rewritten, and sometimes rewritten again! This is actually a good thing.

As an internal project we relied on Cloudflare’s internal tooling for building, testing, and deploying code. These were replaced with tools available to everyone like Docker to ensure anyone can contribute. Our repository is integrated with Circle CI, ensuring that all contributions are automatically tested. In addition to unit tests we test the entire end to end functionality of getting a measurement of the time from a server.

The Future

NTPsec has already released support for NTS but we see very little usage. Please try turning on NTS if you use NTPsec and see how it works with time.cloudflare.com. As the draft advances through the standards process the protocol will undergo an incompatible change when the identifiers are updated and assigned out of the IANA registry instead of being experimental ones, so this is very much an experiment. Note that your daemon will need TLS 1.3 support and so could require manually compiling OpenSSL and then linking against it.

We’ve also added our time service to the public NTP pool. The NTP pool is a widely used volunteer-maintained service that provides NTP servers geographically spread across the world. Unfortunately, NTS doesn’t currently work well with the pool model, so for the best security, we recommend enabling NTS and using time.cloudflare.com and other NTS supporting servers.

In the future, we’re hoping that more clients support NTS, and have licensed our code liberally to enable this. We would love to hear if you incorporate it into a product and welcome contributions to make it more useful.

We’re also encouraged to see that Netnod has a production NTS service at nts.ntp.se. The more time services and clients that adopt NTS, the more secure the Internet will be.

Acknowledgements

Tanya Verma and Gabbi Fisher were major contributors to the code, especially the configuration system and the client code. We’d also like to thank Gary Miller, Miroslav Lichter, and all the people at Cloudflare who set up their laptops and home machines to point to time.cloudflare.com for early feedback.