Many web performance experts recommend putting all critical resources in the first 14 KB of your web page. This is based on a bit of understanding of TCP which underlies each HTTP connection – at least for now until HTTP/3 and QUIC come along. But is it really true? In my book HTTP/2 in Action I suggested that this 14 KB statistic was no longer really that relevant in the modern world, if it ever was, and was asked to expand upon this - hence this post. But let me caveat it by saying that web performance is massively important, and there are many, many, many use cases showing this, so I'm not arguing against it and people should optimise this. Just don't get too hung up on this 14 KB number.
TCP is a guaranteed delivery protocol and uses a number of methods to achieve this. First it acknowledges all TCP packets and resends any unacknowledged packets. Additionally it acts pretty nice to the network and starts slow and builds up to full capacity in a process known as TCP slow start, gently feeling its way and checking it's not overwhelming anything and leading to lost (unacknowledged) packets. The combination of these two things means that when TCP starts it allows 10 TCP packets to be sent, before they must be acknowledged. As those packets are acknowledged it allows more packets to be sent, doubling up to allow 20 packets to be sent next time, then 40, then 80...etc. as it gradually builds up to the full capacity the network can handle. The 14 KB magic number is because each TCP packet can be up to 1500 bytes, but 40 of those bytes are for TCP to use (TCP headers and the like) leaving 1460 bytes for actual data. 10 of those packets means you can deliver 14,600 bytes or about 14 KB (14.25 KB actually).
Now networks are incredibly fast – pretty close to the speed of light actually – and it usually only takes 10s or 100s of milliseconds for those responses to travel back and forth between server and client. However, one of the few things in the known universe faster than the speed of light, is users impatience. So delaying sending more data, while you wait for those acknowledgments, means you are waiting for at least one back and forth between client and server (known as a round trip) and this introduces an unwanted performance bottleneck. It would be much better if the first batch of sends could contain all the critical data to start the browser on its way to drawing (aka rendering) the web page.
While 14 KB may not seem a lot in these days of multi-megabyte pages, remember that you are not trying to fit your entire page into that 14 KB limit (though again do if you can!), but only trying to optimise what the browser sees in that first chunk of data. Delivering all your critical resources in the first 14 KB therefore gives you the best chance of maximising the browser's first read and should lead to a faster page, hence why web performance experts have been giving that advice. In an ideal world, that will even be enough to start rendering if you inline critical CSS – which I don't actually like btw! – but even if you can't get it as far as a one round-trip render, getting the browser to start downloading all the required resources as quickly as possible will also help.
However that advice may not be entirely accurate and personally I think fixating on that magic 14 KB number isn't actual that helpful. It makes several assumptions, that aren't really realistic on the web of today, if they ever were.
First up is the assumption that all of that 14 KB will be used to deliver the HTML. Even in the old world of plaintext HTTP that wasn't the case with HTTP response codes (
200 OK) and HTTP response headers taking up some of that. HTTP response headers in particular can be massive! Twitter's Content Security Policy header is approximately 5 KB alone for example. Yes HTTP/2 allows HTTP response headers to be compressed, but that basically works by storing headers from the previous requests and referring to them on subsequent requests. This means the first request - when we are most concerned about this 14 KB limit - pretty much uses the full sized headers – though that's not 100% accurate as HTTP/2 can still use an initial static table for common headers and use Huffman encoding rather than ASCII for slightly smaller headers, but that is only a partial optimisation and the larger benefit is from reuse for second and subsequent requests. Since I've just introduced HTTP/2 that brings the second problem here - HTTPS and HTTP/2. Both of these require some additional messages to be exchanged to establish the connection.
HTTPS requires two round trips to set up the TLS connection, assuming the most commonly used TLSv1.2 and no session resumption here, as shown in the following handshake diagram:
So that's at least 2 of your 10 TCP packets for sending used up at least. TLSv1.3 allows 1 round trip (1-RTT) – or even 0 round trips (0-RTT) in certain scenarios – which is one of the big benefits of it. However I don't think that changes the argument I'm about to make too much, because additionally TLS certificates can be quite large - easily requiring multiple TCP packets to be sent. So using up 2 TCP packets is probably your best case even for TLSv1.3 and certainly for TLSv1.2.
Then let's assume you're using HTTP/2 as many top sites are using now - especially those that care about performance. HTTP/2 is only available over HTTPS (for browsers at least) and requires a few more messages to be exchanged to set up the HTTP/2 connection. For a start the HTTP/2 connection preface message must be sent by the client first, then a SETTINGS Frame MUST be sent by each side, and often one or more WINDOWS_UPDATE frames are also sent at the beginning of the connection. Only after all this can the client send the HTTP request. Now it's true that those HTTP/2 frames do not need to be separate TCP packets and also do not need to be acknowledge before client requests can be made, but they do eat into this initial 10 TCP packet initial limit. Additionally, though it's hardly worth mentioning since it is so small, but each HTTP/2 frame is also preceded by a header of at least 9 bytes, depending on the exact frame type, which further easts into that 14.25 KB limit – if we're gonna count the TCP packet overhead then only seems fair to do the same for HTTP/2 packet overhead!
So if this 10 packet / 14 KB stat was accurate, then we'd be down to at least half of that (2 packets for TLS handshake, 2 for the HTTP/2 connection set up responses and 1 for HTTP Header response leaving 5 packets), which sounds much worse! However, on the plus side, some of those back and forth messages will have resulted in TCP ACKs meaning we will have more than 14 KB when we come to sending back the HTML not less. For example TLS requires the clients to respond during the handshake, which, as we shall soon see, means they can also acknowledge some of those previously sent TCP packets at the same time, increasing the congestion window size, so we will already have increased beyond that 10 packet limit.
There is also the assumption that only 10 TCP packets can be sent. While that is true of most modern operating systems, it is a relatively new change, and previously 1, 2 and then 4 packets were used as this limit. It's safe to assume, unless you are running in legacy hardware (and let's assume those worried enough about performance to be looking at this 14 KB limit are not), then that it is at least 14 KB, however some CDNs have even started going higher than 10. So far I have not seen any proposals to suggest increasing this in general for servers, but again those interested in performance may well be using a CDN.
Additionally, as alluded to earlier, this 14 KB number has always been based on a somewhat flawed assumption - that TCP usage is such an exact and clean protocol. It's fine to say that TCP stacks can send up to 10 packets without acknowledgement, and that each acknowledgement of all the packets in flight will double the congestion window size meaning after this initial 10 packets, 20 packets can be sent, then 40 packets, 80 packets...etc. However life is rarely that clinical. The reality is that TCP will be acknowledging packets all the time depending on the TCP stack and the timings involved. The 10 packet limit is just a maximum unacknowledged limit, but quite often an acknowledgement may be sent after, say, 5 packets. Or even after 1. This is especially true when the client has to send data anyway (such as part of the TLS handshake, or as part of setting up the HTTP/2 connection), so the reality is that often (always?) the congestion window will be larger than 10 KB by the time you come to send the HTML anyway.