HTTP (Hypertext Transfer Protocol) has become the most widely used application layer protocol on the Internet. However, it is primarily a network protocol for transferring hypertext and provides no security guarantees. Transmitting data packets in plaintext over the Internet makes eavesdropping and man-in-the-middle attacks possible. Transmitting passwords over HTTP is essentially the same as running naked on the Internet.

This article was first published in the Medium MPP plan. If you are a Medium user, please follow me on Medium. Thank you very much.

In 1994, Netscape designed the HTTPS (Hypertext Transfer Protocol Secure) protocol, which uses the Secure Sockets Layer (SSL) to ensure secure data transmission. With the development of the Transport Layer Security (TLS) protocol, we have replaced the deprecated SSL protocol with TLS, although the term “SSL certificate” is still used.

HTTPS is an extension of the HTTP protocol that allows us to transmit data over the Internet securely. However, the initial request in an HTTPS connection requires 4.5 times the round-trip time (RTT) compared to HTTP. This article will provide a detailed explanation of the request and response process, analyzing why the HTTPS protocol requires 4.5 RTT to obtain a response from the service provider:

TCP Protocol: Both communication parties establish a TCP connection through a three-way handshake.
TLS Protocol: Both communication parties establish a TLS connection through a four-way handshake.
HTTP Protocol: The client sends a request to the server, and the server responds.

The analysis is based on specific versions of protocol implementations and common scenarios. With the advancement of network technology, we can reduce the number of required network communications. This article will mention some common optimization solutions in the corresponding sections.

TCP

As an application layer protocol, HTTP relies on a lower-level transport layer protocol to provide basic data transmission functionality. TCP is commonly used as the underlying protocol for HTTP. To prevent the establishment of erroneous historical connections, TCP communication parties perform a three-way handshake to establish a TCP connection⁶. Let’s briefly review the entire process of establishing a TCP connection.
Pasted image 20240220200057

The client sends a segment with the SYN flag and the initial sequence number of the data segment (SEQ = M) to the server.
Upon receiving the segment, the server sends a segment with the SYN and ACK flags to the client:
- The server acknowledges the initial sequence number of the client’s data segment by returning ACK = M+1.
- The server notifies the client of the initial sequence number of the server’s data segment by sending SEQ = N.
The client sends a segment with the ACK flag to the server, confirming the server’s initial sequence number, including ACK = N+1.

Through the three-way handshake, the TCP connection parties determine the initial sequence number, window size, and maximum segment size of the TCP connection. This allows the communication parties to ensure that the data segments are not duplicated or missed, control the flow through the window size, and avoid IP fragmentation by using the maximum segment size.

The original version of the TCP protocol did require a three-way communication to establish a TCP connection. In most current scenarios, the three-way handshake is unavoidable. However, TCP Fast Open (TFO), proposed in 2014, can establish a TCP connection in certain scenarios with just one communication.
Pasted image 20240220200245

The TFO strategy uses a TFO Cookie stored on the client to establish a connection with the server quickly. When the client initiates a TCP connection to the server, it includes the TFO option in the SYN message. The server generates a Cookie and sends it to the client. The client caches the Cookie and uses it to establish a TCP connection directly with the server when reconnecting. After verifying the Cookie, the server sends SYN and ACK to the client, initiating data transmission. This reduces the number of communications required.

TLS

The purpose of TLS is to build a secure transmission channel on top of the reliable TCP protocol. However, TLS itself does not provide reliability guarantees, so we still need a reliable transport layer protocol underneath. After establishing a reliable TCP connection between the communication parties, we need to exchange keys through the TLS handshake. Here, we will explain the connection establishment process of TLS 1.2:

Pasted image 20240220195816

The client sends a Client Hello message to the server, including the client’s supported protocol version, encryption algorithms, compression algorithms, and client-generated random number.
Upon receiving the information about the client’s supported protocol version and encryption algorithms, the server:
- Sends a Server Hello message to the client, specifying the chosen protocol version, encryption method, session ID, and server-generated random number.
- Sends a Certificate message to the client, which includes the server’s certificate chain, including information about supported domains, issuers, and expiration dates.
- Sends a Server Key Exchange message, transmitting the public key and signature information.
- Optionally sends a CertificateRequest message, requesting the client’s certificate for verification.
- Sends a Server Hello Done message to the client, indicating that all relevant information has been sent.
Upon receiving the server’s protocol version, encryption method, session ID, and certificate, the client verifies the server’s certificate:
- Sends a Client Key Exchange message to the server, including the pre-master secret, which is a random string encrypted with the server’s public key.
- Sends a Change Cipher Spec message to the server, indicating that subsequent data segments will be encrypted.
- Sends a Finished message to the server, which includes an encrypted handshake message.
Upon receiving the Change Cipher Spec and Finished messages from the client:
- Sends a Change Cipher Spec message to the client, indicating that subsequent data segments will be encrypted.
- Sends a Finished message to the client, verifying the client’s Finished message and completing the TLS handshake.

The key to the TLS handshake uses the random strings generated by both communication parties and the server’s public key to generate a negotiated key. This symmetric key allows both parties to encrypt messages, preventing eavesdropping and attacks by intermediaries and ensuring secure communication.

In TLS 1.2, establishing a TLS connection takes 2 RTT. However, TLS 1.3 optimizes the protocol, reducing the round-trip time to one, significantly reducing the time required. After 1 RTT, the client can already transmit application-layer data to the server.

We won’t go into detail about the TLS 1.3 connection establishment process here. In addition to reducing the network overhead in regular handshakes, TLS 1.3 introduces a 0-RTT connection establishment process. 60% of network connections are established when users first visit a website or after a certain interval, while the remaining 40% can be addressed using the 0-RTT strategy of TLS 1.3. However, this strategy, similar to TFO, carries some security risks and should be used with consideration for specific business scenarios.

HTTP

Transmitting data over a well-established TCP and TLS channel is relatively straightforward. The HTTP protocol can directly utilize the reliable and secure channel established at the lower layers to transmit data. The client writes data to the server using the TCP socket interface, and the server responds through the same means after processing the data. Since the entire process involves the client sending a request and the server returning a response, it takes 1 RTT.

The data exchange in the HTTP protocol consumes only 1 RTT. When the client and server handle a single HTTP request, we cannot optimize beyond the HTTP protocol itself. However, as the number of requests increases, HTTP/2 allows the reuse of established TCP connections to reduce the additional overhead of TCP and TLS handshakes.

Summary

When a client wants to access a server via HTTPS, the entire process requires 7 handshakes and consumes 9 times the latency. If the RTT is approximately 40ms due to physical distance limitations, the first request requires ~180ms. However, if we want to access a server in the United States with an RTT of approximately 200ms, the HTTPS request will take ~900ms, which is a significant delay. Let’s summarize the reasons why the HTTPS protocol requires 9 times the latency to complete communication:

The TCP protocol requires a three-way handshake to establish a reliable TCP connection (1.5 RTT).
The TLS protocol establishes a TLS connection over TCP through a four-way handshake to ensure communication security (2 RTT).
The HTTP protocol sends a request and receives a response over TCP and TLS in one round trip (1 RTT).

It is important to note that the calculations of round-trip delay in this article are based on specific scenarios and protocol versions. Network protocols are constantly evolving, and issues that were initially overlooked are often addressed through patch updates. However, in the end, a complete rewrite from the ground up is still necessary.

HTTP/3 is an example of this. It uses the UDP-based QUIC protocol for handshakes, combining the TCP and TLS handshake processes to reduce the 7 handshakes to 3. It directly establishes a reliable and secure transmission channel, reducing the time required from ~900ms to ~500ms. We will cover HTTP/3-related content in future articles. Finally, let’s consider some open-ended questions for further exploration. Interested readers can carefully consider the following questions:

What are the similarities and differences between the QUIC protocol and the TCP protocol as transport layer protocols?
How is it possible to establish a client-server connection using 0-RTT?

Why does HTTPS need 7 handshakes and 9 times delay?

TCP

TLS

HTTP

Summary

Further Reading