
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Tue, 14 Apr 2026 15:22:44 GMT</lastBuildDate>
        <item>
            <title><![CDATA[eBPF, Sockets, Hop Distance and manually writing eBPF assembly]]></title>
            <link>https://blog.cloudflare.com/epbf_sockets_hop_distance/</link>
            <pubDate>Thu, 29 Mar 2018 10:43:38 GMT</pubDate>
            <description><![CDATA[ A friend gave me an interesting task: extract IP TTL values from TCP connections established by a userspace program. This seemingly simple task quickly exploded into an epic Linux system programming hack.  ]]></description>
            <content:encoded><![CDATA[ <p>A friend gave me an interesting task: extract IP TTL values from TCP connections established by a userspace program. This seemingly simple task quickly exploded into an epic Linux system programming hack. The result code is grossly over engineered, but boy, did we learn plenty in the process!</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1UrWjrMBPW4l3ipvThL0sn/1e78cd221cc63a5fb9964b3bd55cd11d/3845353725_7d7c624f34_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by-sa/2.0/">CC BY-SA 2.0</a> <a href="https://www.flickr.com/photos/paulmiller/3845353725/">image</a> by <a href="https://www.flickr.com/photos/paulmiller">Paul Miller</a></p>
    <div>
      <h3>Context</h3>
      <a href="#context">
        
      </a>
    </div>
    <p>You may wonder why she wanted to inspect the TTL packet field (formally known as "IP Time To Live (TTL)" in IPv4, or "Hop Count" in IPv6)? The reason is simple - she wanted to ensure that the connections are routed outside of our datacenter. The "Hop Distance" - the difference between the TTL value set by the originating machine and the TTL value in the packet received at its destination - shows how many routers the packet crossed. If a packet crossed two or more routers, we know it indeed came from outside of our datacenter.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6eYDA3vNOF8Cc9ytLOPo9N/c4862fac898725e1bd54b0284977f251/Screen-Shot-2018-03-29-at-10.52.49-AM-1.png" />
            
            </figure><p>It's uncommon to look at TTL values (except for their intended purpose of mitigating routing loops by checking when the TTL reaches zero). The normal way to deal with the problem we had would be to blocklist IP ranges of our servers. But it’s not that simple in our setup. Our IP numbering configuration is rather baroque, with plenty of Anycast, Unicast and Reserved IP ranges. Some belong to us, some don't. We wanted to avoid having to maintain a hard-coded blocklist of IP ranges.</p><p>The gist of the idea is: we want to note the TTL value from a returned SYN+ACK packet. Having this number we can estimate the Hop Distance - number of routers on the path. If the Hop Distance is:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1LCtnMJcT3dIXlGwBrWf69/0dd6441edca81092fe6c536203657e21/Screen-Shot-2018-03-29-at-10.50.38-AM.png" />
            
            </figure><ul><li><p><b>zero</b>: we know the connection went to localhost or a local network.</p></li></ul>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2ugFXo390P0VMXOpsvqtVQ/04fc3c2133fbcfaa40db47655b131ab3/Screen-Shot-2018-03-29-at-10.49.42-AM.png" />
            
            </figure><ul><li><p><b>one</b>: connection went through our router, and was terminated just behind it.</p></li></ul>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5dwa36qw1gHyNNLKJ24UiX/a3ee616930c1335d4e6f8dc3fa7da5bb/Screen-Shot-2018-03-29-at-10.49.48-AM.png" />
            
            </figure><ul><li><p><b>two</b>: connection went through two routers. Most possibly our router, and one just near to it.</p></li></ul><p>For our use case, we want to see if the Hop Distance was two or more - this would ensure the connection was routed outside the datacenter.</p>
    <div>
      <h3>Not so easy</h3>
      <a href="#not-so-easy">
        
      </a>
    </div>
    <p>It's easy to read TTL values from a userspace application, right? No. It turns out it's almost impossible. Here are the theoretical options we considered early on:</p><p>A) Run a libpcap/tcpdump-like raw socket, and catch the SYN+ACK's manually. We ruled out this design quickly - it requires elevated privileges. Also, raw sockets are pretty fragile: they can suffer packet loss if the userspace application can’t keep up.</p><p>B) Use the IP_RECVTTL socket option. IP_RECVTTL requests a "cmsg" data to be attached to control/ancillary data in a <code>recvmsg()</code> syscall. This is a good choice for UDP connections, but this socket option is not supported by TCP SOCK_STREAM sockets.</p><p>Extracting the TTL is not so easy.</p>
    <div>
      <h3>SO_ATTACH_FILTER to rule the world!</h3>
      <a href="#so_attach_filter-to-rule-the-world">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1qX4zkQCTMRtJsnaqzysvp/a95be0f638ee764a86591c9576671281/315128991_d49c312fbc_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by-sa/2.0/">CC BY-SA 2.0</a> <a href="https://www.flickr.com/photos/leejordan/315128991/">image</a> by <a href="https://www.flickr.com/photos/leejordan/">Lee Jordan</a></p><p>Wait, there is a third way!</p><p>You see, for quite some time it has been possible to attach a BPF filtering program to a socket. See <a href="http://man7.org/linux/man-pages/man7/socket.7.html"><code>socket(7)</code></a></p>
            <pre><code>SO_ATTACH_FILTER (since Linux 2.2), SO_ATTACH_BPF (since Linux 3.19)
    Attach a classic BPF (SO_ATTACH_FILTER) or an extended BPF
    (SO_ATTACH_BPF) program to the socket for use as a filter of
    incoming packets.  A packet will be dropped if the filter pro‐
    gram returns zero.  If the filter program returns a nonzero
    value which is less than the packet's data length, the packet
    will be truncated to the length returned.  If the value
    returned by the filter is greater than or equal to the
    packet's data length, the packet is allowed to proceed unmodi‐
    fied.</code></pre>
            <p>You probably take advantage of SO_ATTACH_FILTER already: This is how tcpdump/wireshark does filtering when you're dumping packets off the wire.</p><p>How does it work? Depending on the result of a <a href="/bpf-the-forgotten-bytecode/">BPF program</a>, packets can be filtered, truncated or passed to the socket without modification. Normally SO_ATTACH_FILTER is used for RAW sockets, but surprisingly, BPF filters can also be attached to normal SOCK_STREAM and SOCK_DGRAM sockets!</p><p>We don't want to truncate packets though - we want to extract the TTL. Unfortunately with Classical BPF (cBPF) it's impossible to extract any data from a running BPF filter program.</p>
    <div>
      <h3>eBPF and maps</h3>
      <a href="#ebpf-and-maps">
        
      </a>
    </div>
    <p>This changed with modern BPF machinery, which includes:</p><ul><li><p>modernised eBPF bytecode</p></li><li><p>eBPF maps</p></li><li><p><a href="https://man7.org/linux/man-pages/man2/bpf.2.html"><code>bpf()</code> syscall</a></p></li><li><p>SO_ATTACH_BPF socket option</p></li></ul><p>eBPF bytecode can be thought of as an extension to Classical BPF, but it's the extra features that really let it shine.</p><p>The gem is the "map" abstraction. An eBPF map is a thingy that allows an eBPF program to store data and share it with a userspace code. Think of an eBPF map as a data structure (a hash table most usually) shared between a userspace program and an eBPF program running in kernel space.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/51BMjhXrkIIAlAatG0qsMh/bdd6b955f048ae0a2cd5f5005a460bfe/Screen-Shot-2018-03-29-at-10.59.43-AM.png" />
            
            </figure><p>To solve our TTL problem, we can use eBPF filter program. It will look at the TTL values of passing packets, and save them in an eBPF map. Later, we can inspect the eBPF map and analyze the recorded values from userspace.</p>
    <div>
      <h3>SO_ATTACH_BPF to rule the world!</h3>
      <a href="#so_attach_bpf-to-rule-the-world">
        
      </a>
    </div>
    <p>To use eBPF we need a number of things set up. First, we need to create an "eBPF map". There <a href="https://elixir.bootlin.com/linux/v4.15.13/source/include/uapi/linux/bpf.h#L99">are many specialized map types</a>, but for our purposes let's use the "hash" BPF_MAP_TYPE_HASH type.</p><p>We need to figure out the <i>"bpf(BPF_MAP_CREATE, map type, key size, value size, limit, flags)"</i> parameters. For our small TTL program, let's set 4 bytes for key size, and 8 byte value size. The max element limit is set to 5. It doesn't matter, we expect all the packets in one connection to have just one coherent TTL value anyway.</p><p>This is how it would look in a <a href="https://github.com/cloudflare/cloudflare-blog/blob/master/2018-03-ebpf/ebpf.go#L57">Golang code</a>:</p>
            <pre><code>bpfMapFd, err := ebpf.NewMap(ebpf.Hash, 4, 8, 5, 0)</code></pre>
            <p>A word of warning is needed here. BPF maps use the "locked memory" resource. With multiple BPF programs and maps, it's easy to exhaust the default tiny 64 KiB limit. Consider bumping this with <code>ulimit -l</code>, for example:</p>
            <pre><code>ulimit -l 10240</code></pre>
            <p>The <code>bpf()</code> syscall returns a file descriptor pointing to the kernel BPF map we just created. With it handy we can operate on a map. The possible operations are:</p><ul><li><p><code>bpf(BPF_MAP_LOOKUP_ELEM, &lt;key&gt;)</code></p></li><li><p><code>bpf(BPF_MAP_UPDATE_ELEM, &lt;key&gt;, &lt;value&gt;, &lt;flags&gt;)</code></p></li><li><p><code>bpf(BPF_MAP_DELETE_ELEM, &lt;key&gt;)</code></p></li><li><p><code>bpf(BPF_MAP_GET_NEXT_KEY, &lt;key&gt;)</code></p></li></ul><p>More on this later.</p><p>With the map created, we need to create a BPF program. As opposed to classical BPF - where the bytecode was a parameter to SO_ATTACH_FILTER - the bytecode is now loaded by the <code>bpf()</code> syscall. Specifically: <code>bpf(BPF_PROG_LOAD)</code>.</p><p>In our <a href="https://github.com/cloudflare/cloudflare-blog/blob/master/2018-03-ebpf/ebpf.go#L78-L131">Golang program the eBPF program setup</a> looks like:</p>
            <pre><code>ebpfInss := ebpf.Instructions{
	ebpf.BPFIDstOffSrc(ebpf.LdXW, ebpf.Reg0, ebpf.Reg1, 16),
	ebpf.BPFIDstOffImm(ebpf.JEqImm, ebpf.Reg0, 3, int32(htons(ETH_P_IPV6))),
	ebpf.BPFIDstSrc(ebpf.MovSrc, ebpf.Reg6, ebpf.Reg1),
	ebpf.BPFIImm(ebpf.LdAbsB, int32(-0x100000+8)),
...
	ebpf.BPFIDstImm(ebpf.MovImm, ebpf.Reg0, -1),
	ebpf.BPFIOp(ebpf.Exit),
}

bpfProgram, err := ebpf.NewProgram(ebpf.SocketFilter, &amp;ebpfInss, "GPL", 0)</code></pre>
            <p>Writing eBPF by hand is rather controversial. Most people use <code>clang</code> (from version 3.7 onwards) to compile a code written in a C dialect into an eBPF bytecode. The resulting bytecode is saved in an ELF file, which can be loaded by most eBPF libraries. This ELF file also includes description of maps, so you don’t need to set them manually.</p><p>I personally don't see the point in adding an ELF/clang dependency for simple SO_ATTACH_BPF snippets. Don't be afraid of the raw bytecode!</p>
    <div>
      <h3>BPF calling convention</h3>
      <a href="#bpf-calling-convention">
        
      </a>
    </div>
    <p>Before we go further we should highlight couple of things about the eBPF environment. The official kernel documentation isn't too friendly:</p><ul><li><p><a href="https://www.kernel.org/doc/Documentation/networking/filter.txt">Documentation/networking/filter.txt</a></p></li></ul><p>The first important bit to know, is the calling convention:</p><ul><li><p>R0 - return value from in-kernel function, and exit value for eBPF program</p></li><li><p>R1-R5 - arguments from eBPF program to in-kernel function</p></li><li><p>R6-R9 - callee saved registers that in-kernel function will preserve</p></li><li><p>R10 - read-only frame pointer to access stack</p></li></ul><p>When the BPF is started, R1 contains a pointer to <code>ctx</code>. This data structure <a href="https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/bpf.h#L799">is defined as <code>struct __sk_buff</code></a>. For example, to access the <code>protocol</code> field you'd need to run:</p>
            <pre><code>r0 = *(u32 *)(r1 + 16)</code></pre>
            <p>Or in other words:</p>
            <pre><code>ebpf.BPFIDstOffSrc(ebpf.LdXW, ebpf.Reg0, ebpf.Reg1, 16),</code></pre>
            <p>Which is exactly what we do in first line of our program, since we need to choose between IPv4 or IPv6 code branches.</p>
    <div>
      <h3>Accessing the BPF payload</h3>
      <a href="#accessing-the-bpf-payload">
        
      </a>
    </div>
    <p>Next, there are special instructions for packet payload loading. Most BPF programs (but not all!) run in the context of packet filtering, so it makes sense to accelerate data lookups by having magic opcodes for accessing packet data.</p><p>Instead of dereferencing context, like <code>ctx-&gt;data[x]</code> to load a byte, BPF supports the <code>BPF_LD</code> instruction that can do it in one operation. There are caveats though, the documentation says:</p>
            <pre><code>eBPF has two non-generic instructions: (BPF_ABS | &lt;size&gt; | BPF_LD) and
(BPF_IND | &lt;size&gt; | BPF_LD) which are used to access packet data.

They had to be carried over from classic BPF to have strong performance of
socket filters running in eBPF interpreter. These instructions can only
be used when interpreter context is a pointer to 'struct sk_buff' and
have seven implicit operands. Register R6 is an implicit input that must
contain pointer to sk_buff. Register R0 is an implicit output which contains
the data fetched from the packet. Registers R1-R5 are scratch registers
and must not be used to store the data across BPF_ABS | BPF_LD or
BPF_IND | BPF_LD instructions.</code></pre>
            <p>In other words: before calling <code>BPF_LD</code> we must move <code>ctx</code> to R6, like this:</p>
            <pre><code>ebpf.BPFIDstSrc(ebpf.MovSrc, ebpf.Reg6, ebpf.Reg1),</code></pre>
            <p>Then we can call the load:</p>
            <pre><code>ebpf.BPFIImm(ebpf.LdAbsB, int32(-0x100000+7)),</code></pre>
            <p>At this stage the result is in r0, but we must remember the r1-r5 should be considered dirty. For an instruction the <code>BPF_LD</code> looks very much like a function call.</p>
    <div>
      <h3>Magical Layer 3 offset</h3>
      <a href="#magical-layer-3-offset">
        
      </a>
    </div>
    <p>Next note the load offset - we loaded the <code>-0x100000+7</code> byte of the packet. This magic offset is another BPF context curiosity. It turns out that the BPF script loaded under SO_ATTACH_BPF on a SOCK_STREAM (or SOCK_DGRAM) socket, will only see Layer 4 and higher OSI layers by default. To extract the TTL we need access to the layer 3 header (i.e. the IP header). To access L3 in the L4 context, we must offset the data lookups by magical -0x100000.</p><p>This magic constant <a href="https://github.com/torvalds/linux/blob/ead751507de86d90fa250431e9990a8b881f713c/include/uapi/linux/filter.h#L84">is defined in the kernel</a>.</p><p>For completeness, the <code>+7</code> is, of course, the offset of the TTL field in an IPv4 packet. Our small BPF program also supports IPv6 where the TTL/Hop Count is at offset <code>+8</code>.</p>
    <div>
      <h3>Return value</h3>
      <a href="#return-value">
        
      </a>
    </div>
    <p>Finally, the return value of the BPF program is meaningful. In the context of packet filtering it will be interpreted as a truncated packet length.Had we returned 0 - the packet would be dropped and wouldn't be seen by the userspace socket application. It's quite interesting that we can do packet-based data manipulation with eBPF on a stream-based socket. Anyway, our script returns -1, which when cast to unsigned will be interpreted as a very large number:</p>
            <pre><code>ebpf.BPFIDstImm(ebpf.MovImm, ebpf.Reg0, -1),
ebpf.BPFIOp(ebpf.Exit),</code></pre>
            
    <div>
      <h3>Extracting data from map</h3>
      <a href="#extracting-data-from-map">
        
      </a>
    </div>
    <p>Our running BPF program will set a key on our map for any matched packet. The key is the recorded TTL value, the value is the packet count. The value counter is somewhat vulnerable to a tiny race condition, but it's ignorable for our purposes. Later on, to extract the data from userspace program we use this Golang loop:</p>
            <pre><code>var (
	value   MapU64
	k1, k2  MapU32
)

for {
	ok, err := bpfMap.Get(k1, &amp;value, 8)
	if ok {
		// k1 is TTL, value is counter
		...
	}

	ok, err = bpfMap.GetNextKey(k1, &amp;k2, 4)
	if err != nil || ok == false {
		break
	}
	k1 = k2
}</code></pre>
            
    <div>
      <h3>Putting it all together</h3>
      <a href="#putting-it-all-together">
        
      </a>
    </div>
    <p>Now with all the pieces ready we can make it a proper runnable program. There is little point in discussing it here, so allow me to refer to the source code. The BPF pieces are here:</p><ul><li><p><a href="https://github.com/cloudflare/cloudflare-blog/blob/master/2018-03-ebpf/ebpf.go">ebpf.go</a></p></li></ul><p>We haven't discussed how to catch inbound SYN+ACK in the BPF program. This is a matter of setting up BPF before calling <code>connect()</code>. Sadly, it's impossible to customize <code>net.Dial</code> in Golang. Instead we wrote a surprisingly painful and awful custom Dial implementation. The ugly custom dialer code is here:</p><ul><li><p><a href="https://github.com/cloudflare/cloudflare-blog/blob/master/2018-03-ebpf/magic_conn.go">magic_conn.go</a></p></li></ul><p>To run all this you need kernel 4.4+ Kernel with the <code>bpf()</code> syscall compiled in. BPF features of specific kernels are documented in this superb page from BCC:</p><ul><li><p><a href="https://github.com/iovisor/bcc/blob/master/docs/kernel-versions.md">docs/kernel-versions.md</a></p></li></ul><p>Run the code to observe the TTL Hop Counts:</p>
            <pre><code>$ ./ttl-ebpf tcp4://google.com:80 tcp6://google.com:80 \
             tcp4://cloudflare.com:80 tcp6://cloudflare.com:80
[+] TTL distance to tcp4://google.com:80 172.217.4.174 is 6
[+] TTL distance to tcp6://google.com:80 [2607:f8b0:4005:809::200e] is 4
[+] TTL distance to tcp4://cloudflare.com:80 198.41.215.162 is 3
[+] TTL distance to tcp6://cloudflare.com:80 [2400:cb00:2048:1::c629:d6a2] is 3</code></pre>
            
    <div>
      <h3>Takeaways</h3>
      <a href="#takeaways">
        
      </a>
    </div>
    <p>In this blog post we dived into the new eBPF machinery, including the <code>bpf()</code> syscall, maps and SO_ATTACH_BPF. This work allowed me to realize the potential of running SO_ATTACH_BPF on fully established TCP sockets. Undoubtedly, eBPF still requires plenty of love and documentation, but it seems to be a perfect bridge to expose low level toggles to userspace applications.</p><p>I highly recommend keeping the dependencies small. For small BPF programs, like the one shown, there is little need for complex clang compilation and ELF loading. Don't be afraid of the eBPF bytecode!</p><p>We only touched on SO_ATTACH_BPF, where we analyzed network packets with BPF running on network sockets. There is more! First, you can attach BPFs to a dozen "things", XDP being the most obvious example. <a href="https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/bpf.h#L119">Full list</a>. Then, it's possible to actually affect kernel packet processing, here is a <a href="https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/bpf.h#L318">full list of helper functions</a>, some of which can modify kernel data structures.</p><p>In February <a href="https://lwn.net/Articles/747551/">LWN jokingly wrote</a>:</p>
            <pre><code>Developers should be careful, though; this could
prove to be a slippery slope leading toward something 
that starts to look like a microkernel architecture.</code></pre>
            <p>There is a grain of truth here. Maybe the ability to run eBPF on variety of subsystems feels like microkernel coding, but definitely the SO_ATTACH_BPF smells like <a href="https://en.wikipedia.org/wiki/STREAMS">STREAMS</a> programming model <a href="https://cseweb.ucsd.edu/classes/fa01/cse221/papers/ritchie-stream-io-belllabs84.pdf">from 1984</a>.</p><hr /><p>Thanks to <a href="https://twitter.com/akajibi">Gilberto Bertin</a> and <a href="https://twitter.com/dwragg">David Wragg</a> for helping out with the eBPF bytecode.</p><hr /><p><i>Doing eBPF work sound interesting? Join our </i><a href="https://boards.greenhouse.io/cloudflare/jobs/589572"><i>world famous team</i></a><i> in London, Austin, San Francisco, Champaign and our elite office in Warsaw, Poland</i>.</p> ]]></content:encoded>
            <category><![CDATA[TTL]]></category>
            <category><![CDATA[TCP]]></category>
            <category><![CDATA[IPv4]]></category>
            <category><![CDATA[IPv6]]></category>
            <category><![CDATA[eBPF]]></category>
            <guid isPermaLink="false">6LTjo1ZkNHiWvy46IuloEd</guid>
            <dc:creator>Marek Majkowski</dc:creator>
        </item>
        <item>
            <title><![CDATA[DDoS Packet Forensics: Take me to the hex!]]></title>
            <link>https://blog.cloudflare.com/ddos-packet-forensics-take-me-to-the-hex/</link>
            <pubDate>Tue, 06 Jan 2015 23:10:34 GMT</pubDate>
            <description><![CDATA[ A few days ago, my colleague Marek sent an email about a DDoS attack against one of our DNS servers that we'd been blocking with our BPF rules. ]]></description>
            <content:encoded><![CDATA[ <p>A few days ago, my colleague Marek sent an email about a DDoS attack against one of our DNS servers that we'd been blocking with our <a href="/bpf-the-forgotten-bytecode/">BPF rules</a>. He noticed that there seemed to be a strange correlation between the TTL field in the IP header and the IPv4 source address.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6g3lAZht4S4HBqWLjL3y33/9a9631d741d3a9b895d1ad50e39bd241/30957385_bfb4b7fb79_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0">CC BY 2.0</a> <a href="https://www.flickr.com/photos/adactio/30957385/">image</a> by <a href="https://www.flickr.com/photos/adactio/">Jeremy Keith</a></p><p>The source address was being spoofed, as usual, and apparently chosen randomly, but something else was going on. He offered a bottle of Scotch to the first person to come up with a satisfactory solution.</p><p>Here's what some of the packets looked like:</p>
            <pre><code>$ tcpdump -ni eth0 -c 10 "ip[8]=40 and udp and port 53"
1.181.207.7.46337 &gt; x.x.x.x.53: 65098+
1.178.97.141.45569 &gt; x.x.x.x.53: 65101+
1.248.136.142.63489 &gt; x.x.x.x.53: 65031+
1.207.241.195.52993 &gt; x.x.x.x.53: 65072+

$ tcpdump -ni eth0 -c 10 "ip[8]=41 and udp and port 53"
2.10.30.2.2562 &gt; x.x.x.x.53: 65013+
2.4.9.36.1026 &gt; x.x.x.x.53: 65019+
2.98.1.99.25090 &gt; x.x.x.x.53: 64925+
2.109.69.229.27906 &gt; x.x.x.x.53: 64914+

$ tcpdump -ni eth0 -c 10 "ip[8]=42 and udp and port 53"
4.72.42.184.18436 &gt; x.x.x.x.53: 64439+
4.240.78.0.61444 &gt; x.x.x.x.53: 64271+
5.73.44.84.18693 &gt; x.x.x.x.53: 64182+
4.69.99.10.17668 &gt; x.x.x.x.53: 64442+</code></pre>
            <p>I've removed the destination IP address, but left in the DNS ID number (the number with the plus after it), the spoofed source IP, and port. There are three different TTLs represented (40, 41, and 42) by filtering on ip[8].</p><p>Stop reading here if you want to go figure this out for yourself.</p>
    <div>
      <h3>Into the hex</h3>
      <a href="#into-the-hex">
        
      </a>
    </div>
    <p>I couldn't resist Marek's challenge, so I did what I always do with anything involving packets: I went straight for the hex. Since Marek hadn't given me a pcap, I manually converted the first few to hex.</p><p>The reason hex is useful is that bytes, words, dwords, and qwords are the real stuff of computing and looking at decimal obscures data.</p><p>Taking the first three (one for each TTL), I saw this pattern:</p>
            <pre><code>1.181.207.7.46337
2.10.30.2.2562
4.72.42.184.18436

Converted to hex they are

01.b5.cf.07.b501
02.0a.1e.02.0a02
04.48.2a.b8.4804</code></pre>
            <p>It's immediately obvious that the 'random' source port is the first two bytes of the random IP source address reversed: <code>01.b5.cf.07</code> has source port <code>b501</code>.</p><p>A little bit of tinkering revealed a relationship between the TTL and the first byte of the IP address.</p>
            <pre><code>TTL = first byte &gt;&gt; 1 + 40</code></pre>
            <p>This relationship was confirmed by a later packet that had a TTL of 150 and a source IP of 220.255.141.181. The shift right also explained why the same TTL was used when the first byte differed by one (see the third group above for an example: 4.x.x.x and 5.x.x.x have the same TTL).</p><p>I then spotted that the DNS ID field was also related to the IP address.</p>
            <pre><code>1.181.207.7.46337 &gt; x.x.x.x.53: 65098+

Converted to hex:

01.b5.cf.07.b501 &gt; x.x.x.x.35: fe4a+</code></pre>
            <p>It's probably not obvious at first glance (unless you're a hex-level hacker), but fe4a is the one's complement of 01b5. So the DNS ID field was simply the one's complement of the first two bytes of the source IP.</p><p>Finally, Marek gave me a pcap, and I had one more relationship to find. The ID value in the IPv4 header was also related to the random source IP address—in fact, it was just the first two bytes of the IP address.</p>
    <div>
      <h3>Mystery</h3>
      <a href="#mystery">
        
      </a>
    </div>
    <p>One mystery remains (and there's a CloudFlare T-shirt for person with the most convincing explanation): How are the random source IPs being generated?</p><p>The answer might be boring (i.e. it might just be reading bytes from /dev/random), but given the author's love of relationships between fields, perhaps there's something else going on. We've seen IP addresses get reused which leads us to think that there's something to discover here.</p><p>Here's a sequence of actual source IPs seen:</p>
            <pre><code>218.254.187.151
8.187.160.236
123.73.134.186
68.133.199.20
205.26.91.155
169.235.56.120
96.160.119.221
44.226.72.236
205.26.91.155
140.206.27.92
70.62.151.0
161.98.197.249</code></pre>
            <p>We have no real guarantee that the attacker generated them in this order, but perhaps there's an interesting way these are being generated.</p><p>Answers in comments if you manage to figure it out!</p> ]]></content:encoded>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[IPv4]]></category>
            <category><![CDATA[TTL]]></category>
            <guid isPermaLink="false">6c3dtzFmbR4HToIqhUW8aa</guid>
            <dc:creator>John Graham-Cumming</dc:creator>
        </item>
        <item>
            <title><![CDATA[CloudFlare DNS is simple, fast and flexible]]></title>
            <link>https://blog.cloudflare.com/cloudflare-dns-is-simple-fast-and-flexible/</link>
            <pubDate>Thu, 30 Jan 2014 09:24:00 GMT</pubDate>
            <description><![CDATA[ Over the past few years, the CloudFlare blog has covered a great range of different topics, drilling down into the technology we use to both protect websites from attack, and optimise them so that they load faster for visitors. ]]></description>
            <content:encoded><![CDATA[ <p>Over the past few years, the CloudFlare blog has covered a great range of different topics, drilling down into the technology we use to both protect websites from attack, and optimise them so that they load faster for visitors.</p><p>One thing we haven't spent enough time talking about so far though also happens to be at the core of the way our service, as well as any service on the Internet works: <a href="https://www.cloudflare.com/learning/dns/what-is-dns/">DNS</a>. CloudFlare offers DNS services for free- you don't even need to use any of our other free performance or security features to take advantage of our DNS. One of the side effects of building a network such as ours is that our DNS has properties unlike that of nearly all other DNS providers: even within the cutting edge world of <a href="https://www.cloudflare.com/learning/cdn/what-is-a-cdn/">Content Delivery Networks</a>. These unique properties allow you to do things that would be much harder or impossible with other providers.</p>
    <div>
      <h3>A bit about DNS</h3>
      <a href="#a-bit-about-dns">
        
      </a>
    </div>
    <p>First though, a recap for the uninitiated. DNS (Domain Name System) is the way in which the human-readable addresses such as <code>www.cloudflare.com</code> are turned into the IP addresses that computers use to communicate with each other, such as</p><p><code>198.41.213.157</code> or <code>2400:cb00:2048:1::c629:d59d</code>.</p><p>You can think of it as the world's greatest phone book, turning human-readable names such Joe Bloggs into the string of numbers you need to call them. DNS entries (or 'records', in the correct parlance) are announced by DNS 'nameservers', which hold the records for a particular <a href="https://www.cloudflare.com/learning/dns/glossary/what-is-a-domain-name/">domain name</a> or list of domains, such as <code>cloudflare.com</code>. (As a side note, a common misconception is that the domain name includes the <code>www</code> at the beginning: the address <code>www.cloudflare.com</code> is actually a <i>subdomain</i> of the <code>cloudflare.com</code> domain.) Anyone can run a DNS nameserver, and become part of the DNS network.</p><p>When you enter the web site name <code>www.cloudflare.com</code> into your web browser and hit enter, the first thing it does is ask a DNS 'resolver' server (usually provided by your ISP) to find the machine-readable IP address that corresponds with the human-readable domain name. In order to do that, much like a postman searching for the correct house to post a letter, the DNS resolver starts at the end of the address, and works its way to the beginning.</p><p>First, it asks the 'root' nameservers for the location of the <code>.com</code> records. Once that's found, it asks the <code>.com</code> nameservers for the location of the nameservers for the <code>cloudflare.com</code> domain. Once we have that, it can finally ask the <code>cloudflare.com</code> nameservers for the IP address of <code>www.cloudflare.com</code>. The web browser can then carry on with the web request, now that it has the right IP address to send it to.</p><p>Note that in reality it's not necessary to find the root server and then the <code>.com</code> server every time because the DNS server will cache that information.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6TzSCOgjZB69oUHQYwqmHn/f4796105d6f305b48cc40b5eb503b035/dnsrequest.png" />
            
            </figure>
    <div>
      <h3>In theory, there is no difference between theory and practice. In practice, there is.</h3>
      <a href="#in-theory-there-is-no-difference-between-theory-and-practice-in-practice-there-is">
        
      </a>
    </div>
    <p>All great in theory, but there are a few ways in which this process is hampered. Firstly, let's say you run a web site with a DNS nameserver in London, and someone from San Francisco wants to visit your site. To find the correct IP address, they need to make a request all the way from San Francisco to London. A distance of 8,611km is going to add some latency to the request, and slow down the load time of your web site. Secondly, the DNS records have a property known as Time To Live (or TTL). This specifies the length of time that a DNS name should be cached with an ISP's nameservers before being refreshed. This caching means that a commonly accessed name like <code>www.google.com</code> isn't constantly being sent to Google's servers. The lower the TTL (it's usually specified in seconds) the more often DNS resolvers will ask the nameserver for the DNS records of a particular domain name.</p><p>Often to save on traffic and processing power, web hosts and other DNS providers will set this to hours or even days. That means that if you want to make some changes to your DNS, it can often take hours or days for the change to be seen by visitors to your web site, as the resolvers don't know to check for changes to your domain. (Web hosts often refer to this as 'propagation time', though the idea that DNS records need to 'propagate-out' is largely a myth: in most cases, it's the TTL set by the web host or DNS provider that causes delays in DNS changes.) Also like all servers, DNS nameservers will fail from time to time, and when they do your web site will drop off the internet.</p>
    <div>
      <h3>Anycast DNS</h3>
      <a href="#anycast-dns">
        
      </a>
    </div>
    <p>The way CloudFlare routes traffic allows us to get around these issues, and provide interesting advantages to boot. As with the rest of CloudFlare's networking, our DNS works over Anycast. That means that instead of having to make requests all the way back to the web host, who could be on a different continent, the request is instead made to the nearest of CloudFlare's 24 data centres worldwide. For example, I'm based in the UK, but <code>cloudflare.com</code> is hosted in California. Even so, when I perform a traceroute to CloudFlare's DNS nameserver <code>dns2.cloudflare.com</code>, it's only 7 short hops to reach a server in our London data centre.</p><p>$ traceroute dns2.cloudflare.com
traceroute to dns2.cloudflare.com (173.245.58.99), 64 hops max, 52 byte packets
1  10.0.1.1 (10.0.1.1)  1.572 ms  1.875 ms  2.047 ms
2  lo0-central10.pcl-ag07.plus.net (195.166.128.188)  16.613 ms  16.897 ms  15.741 ms
3  link-a-central10.pcl-gw01.plus.net (212.159.2.184)  15.757 ms  15.601 ms  15.641 ms
4  xe-9-0-0.pcl-cr01.plus.net (212.159.0.216)  15.787 ms  16.378 ms  15.766 ms
5  ae1.ptw-cr01.plus.net (195.166.129.0)  15.888 ms  15.742 ms  15.549 ms
6  195.66.225.179 (195.66.225.179)  46.333 ms * *
7  dns2.cloudflare.com (173.245.58.99)  16.925 ms  15.713 ms  15.698 ms</p><p>This can significantly reduce the time that your web site takes to load for visitors, anywhere in the world. Not only that, but because there's no one single physical DNS server, the chances that DNS would fail for the domain are greatly reduced.</p>
    <div>
      <h3>Short TTLs</h3>
      <a href="#short-ttls">
        
      </a>
    </div>
    <p>To solve the issue of having to wait ages for new DNS records to set correctly, CloudFlare has TTLs of 5 minutes on all DNS records by default. That means that if you make a change to, or add a new DNS record, you can expect visitors to your site to see the change in under 5 minutes (2 and a half minutes on average).</p><p>As well as never having to wait for DNS propagation again, this allows you to do things with DNS records that you wouldn't be able to do with other providers. Let's say you've been working on a replacement for your current website in a staging area on a different IP address, and it's now ready to go live. All you need to do is change the IP address set through your account with us and seconds later the new site will be live and publicly accessible.</p><p>Alternatively, let's say you have a server at home which contains personal files such as documents, music and other things you might want to access from work or a coffee shop. Some DNS providers including CloudFlare offer an API so you can <a href="https://github.com/scotchmist/dyndns-cf">write a script</a> to constantly update an address such as <code>home.example.com</code> to point to your server at home, similar to a service such as DynDNS but with your own domain name. With a TTL of hours or days, an update to your home's IP address would make such a script pretty unreliable, as you would need to wait for your domain's records to change. With our DNS though, any downtime for that address wouldn't ever last more than a handful of minutes, allowing you to access your files remotely at all times.</p><p>Low TTLs can have more serious uses as well: let's say you run an enterprise-level service, and near-100% uptime is a major concern. If one of your servers stopped working, a quick DNS change to the IP address of your fail-over server is all that's need to get it back up again. As before, this process is scriptable, and can be set to happen automatically if the script detects issues connecting to the primary server.</p><p>Another advantage of low TTLs is it means we can move the IP addresses used for customers around at short notice. This is really important for CloudFlare as an attack mitigation service. When <a href="/the-ddos-that-almost-broke-the-internet">an attack</a> reaches our network, our primary goal is to isolate it so that it doesn't affect any other customers, and 'null-route' it so that it doesn't go anywhere. Lowering the TTLs for DNS records even further allows us to do this on-the-fly, allowing us to react to attacks within a few seconds of them hitting our network edge.</p>
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>Looking to the future, we are investigating other things we can do with the network we've built, and other DNS-related services we can add on top of the ones we already offer. One interesting feature would be the ability for traffic to be sent to different IP addresses depending on where the visitor is located, allowing customers with servers in multiple data centres to direct requests based on where it comes from. For the time-being though, our DNS offers some of the best reliability and speed in the industry, and can help to speed up your website, for free, regardless of whether you use any of the other services CloudFlare offers.</p><hr /><p><i>Sam Howson is a member of the Support team for CloudFlare, based in London. When he isn't helping customers to take advantage of CloudFlare's awesome features, he enjoys irritating his housemates with the sounds of his violin and mandolin, going to gigs, concerts &amp; plays and exploring London on his bike.</i></p><p><i>CloudFlare is hiring! Do you love helping people, and want to work in one of the fastest growing and most exciting companies in tech? If so, check out our </i><a href="https://www.cloudflare.com/join-our-team"><i>careers page</i></a><i>. We are looking for team members for both our London and San Francisco offices, in a variety of roles including Technical Support.</i></p> ]]></content:encoded>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[TTL]]></category>
            <category><![CDATA[Reliability]]></category>
            <guid isPermaLink="false">U6vzg1DBkYT7nt9wjoHRx</guid>
            <dc:creator>Sam Howson</dc:creator>
        </item>
        <item>
            <title><![CDATA[Edge Cache Expire TTL: Easiest way to override any existing headers]]></title>
            <link>https://blog.cloudflare.com/edge-cache-expire-ttl-easiest-way-to-override/</link>
            <pubDate>Fri, 01 Feb 2013 22:27:00 GMT</pubDate>
            <description><![CDATA[ CloudFlare makes caching easy. Our service automatically determines what files to cache based on file extensions. Performance benefits kick in automatically. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>CloudFlare makes caching easy. Our service automatically determines what files to cache based on file extensions. Performance benefits kick in automatically.</p><p>For customers that want advanced caching, beyond the defaults, we have <a href="/introducing-pagerules-advanced-caching">Cache Everything</a> available as Page Rules. Designate a URL and CloudFlare will cache everything, including HTML, out at the edges of our global network.</p><p>With Cache Everything, we respect all headers. If there is any header in place from the server or a CMS solution like WordPress, we will respect it. However, we got many requests from customers who wanted an easy way to override any existing headers. Today, we are releasing a new feature called 'Edge cache expire TTL' that does just that.</p>
    <div>
      <h3>What is Edge Cache Expire TTL?</h3>
      <a href="#what-is-edge-cache-expire-ttl">
        
      </a>
    </div>
    <p>Edge cache expire TTL is the setting that controls how long CloudFlare's edge servers will cache a resource before requesting a fresh copy from your server. When you create a Cache Everything Page Rule, you now may choose whether to respect all existing headers or to override any headers that are in place from your server. By overwriting the headers, CloudFlare will cache more content at the CloudFlare edge network, decreasing load to your server.</p><p>Common situations where you may choose to overwrite existing headers:</p><ul><li><p>You expect a large surge in traffic</p></li><li><p>You are under DDOS attack</p></li><li><p>You are not sure what the headers on WordPress or your server are set to</p></li><li><p>You are using WordPress and want to easily overwrite the default settings</p></li></ul><p>It is important to emphasize when you <i>do not</i> want to use Cache Everything. If you have any personalized information on the page like login information or credit card information, you do not want to use the Cache Everything option.**What is Browser Cache Expire TTL?**Browser cache expire TTL is the time that CloudFlare instructs a visitor's browser to cache a resource. Until this time expires, the browser will load the resource from its local cache thus speeding up the request significantly. CloudFlare will respect the headers that you give us from your web server, and then we will communicate with the browser based on the time selected in this drop down menu.</p>
    <div>
      <h4>Using both Edge Cache Expire TTL and Browser Cache Expire TTL</h4>
      <a href="#using-both-edge-cache-expire-ttl-and-browser-cache-expire-ttl">
        
      </a>
    </div>
    <p>When you'd like to have CloudFlare cache your content but want your visitors to always get a fresh copy of the page, you can use the new 'Edge cache expire TTL' setting to express this differentiation. Set a value for 'Edge cache expire TTL' to how often you want the CloudFlare CDN to refresh from your server, and 'Browser cache expire TTL' to how often you want your visitors' browsers to refresh the page content. This is useful when you have a rapidly changing page but still want the benefit of the CloudFlare cache to reduce your server load.</p>
    <div>
      <h4>Plan Details</h4>
      <a href="#plan-details">
        
      </a>
    </div>
    <p>CloudFlare offers a range of edge cache expire TTLs based on plan type:</p><ul><li><p>Free 2 hours</p></li><li><p>Pro 1 hour</p></li><li><p>Business 30 minutes</p></li><li><p>Enterprise as low as 30 seconds</p></li></ul><p>A Pro customer may set the refetch time to 1 hour. After 60 minutes, we return to your server for a fresh copy of the resource. Business customers may lower the refetch interval to 30 minutes. Enterprise customers may set this interval as low as 30 seconds.</p>
    <div>
      <h4>How to Turn It On</h4>
      <a href="#how-to-turn-it-on">
        
      </a>
    </div>
    <p>Login in to your CloudFlare account and choose "Page Rules" from under the gear icon. Enter the URL that you want to Cache Everything (under Custom Caching):</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5kXsRHrZ4nstgW7TaJlKmV/073e6b665580d4c944ffb78117f293cf/Cache_Everything.tiff.scaled500.jpg" />
            
            </figure><p>The edge cache server TTL option will appear:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4wwlEobfieqiWI9Jg1kEqO/97302650c2c19a907a729dfbcd8c19ff/Edge_cache_expire_TTL_appears.tiff.scaled500.jpg" />
            
            </figure><p>The default setting is set to "Respect all existing headers." To override this setting, choose a time from the drop down menu:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3peJFc7kv5OTBpMknvhfT1/c6d6cc45f75697ccf769ef9333438ca0/Edge_cache_expire_TTL_dropdown.tiff.scaled500.jpg" />
            
            </figure><p>You can find more information in our knowledge base articles <a href="https://support.cloudflare.com/entries/23023893-what-does-edge-cache-expire-ttl-mean">here</a> and <a href="https://support.cloudflare.com/entries/23009261-what-does-browser-cache-expire-ttl-mean">here.</a></p><p>Give it a try and let us know what you think.</p> ]]></content:encoded>
            <category><![CDATA[Cache]]></category>
            <category><![CDATA[TTL]]></category>
            <category><![CDATA[Speed & Reliability]]></category>
            <guid isPermaLink="false">1tzqjBlO9nVgVIRr0QDLxC</guid>
            <dc:creator>Michelle Zatlyn</dc:creator>
        </item>
        <item>
            <title><![CDATA[Never Deal With DNS Propagation Again]]></title>
            <link>https://blog.cloudflare.com/never-deal-with-dns-propagation-again/</link>
            <pubDate>Sat, 19 May 2012 02:13:00 GMT</pubDate>
            <description><![CDATA[ At CloudFlare, we think a lot about the Domain Name Service -- better known as DNS, the Internet's address book. CloudFlare uses DNS to bring performance and security to our customers.  ]]></description>
            <content:encoded><![CDATA[ <p></p><p>At CloudFlare, we think a lot about the <a href="https://www.cloudflare.com/learning/dns/what-is-dns/">Domain Name Service</a> -- better known as DNS, the Internet's address book. CloudFlare uses DNS to bring performance and security to our customers. But, we don't spend a lot of time talking about our DNS service, behind the scenes.</p>
    <div>
      <h3>The "CloudFlare benefit"</h3>
      <a href="#the-cloudflare-benefit">
        
      </a>
    </div>
    <p>One recent customer comment reminded us of an unsung advantage:</p><blockquote><p>_"The thing is, we had this new design running on a QA server for the past week, doing final testing and updating. Yesterday we changed the IP to point the domain to the new QA server, away from the "old" prod server.</p><p>BUT unlike every other time when we change a site's IP - the change was instantaneous‚ since the only IP the world has for our sites is CloudFlare, so when we change servers, IPs, etc.</p><p>So with CloudFlare, instant server move, web host move, IP change -- never deal with DNS propagation again. That is awesome."_</p></blockquote>
    <div>
      <h3>No More Waiting</h3>
      <a href="#no-more-waiting">
        
      </a>
    </div>
    <p>Because the public IP addresses don't change, using CloudFlare removes the wait for TTL (Time To Live) record expiration and DNS propagation when you make a change to one of your site's DNS records. This waiting period is often an excruciating part of development and deployment changes, because it's completely out of your control. Yes, you can lower your TTLs ahead of time, but it's still a seemingly endless wait for the world's recursive DNS servers to come get the new record.</p><p>If DNS isn't something you spend a lot of time thinking about, that's fine. Just know that once you're on CloudFlare, your changes are live faster, and you can get back to everything else on your plate. Making running a website easier...that's part of CloudFlare's goal, even in the small things.</p><p>P.S. - We recently made <a href="/managing-dns-records/">major improvements</a> to our DNS management interface. You can also manage DNS via <a href="http://www.cloudflare.com/docs/client-api.html">API</a>.</p> ]]></content:encoded>
            <category><![CDATA[TTL]]></category>
            <guid isPermaLink="false">5GremFsvESMEbO5jkMWB3D</guid>
            <dc:creator>John Roberts</dc:creator>
        </item>
    </channel>
</rss>