TCP protocol is a network protocol that we encounter almost every day. The majority of network connections are established based on the TCP protocol. People who have studied computer networks or have some understanding of the TCP protocol know that establishing a connection using TCP requires a three-way handshake.
This article is first published in the medium MPP plan. If you are a medium user, please follow me in medium. Thank you very much.
If we briefly explain the process of establishing a TCP connection, many people who have prepared for interviews would be familiar with it. However, when it comes to delving into the question of “Why does TCP require a three-way handshake to establish a connection?” most people would not be able to answer this question or might provide incorrect answers. This article will discuss why we need a three-way handshake to establish a TCP connection instead of four or two.
Overview
Before analyzing the question at hand, let’s first address a common misconception that has misled many people regarding the TCP connection process. For a long time, the author of this article also believed that it provided a good explanation for why a TCP connection requires a three-way handshake:
–> Can you hear me?
<– I can hear you. Can you hear me?
–> I can hear you too.
Using analogies to explain a problem often leads to a situation where “nine out of ten analogies are wrong.” If someone uses an analogy to answer your “why” question, you need to carefully consider the flaws in their analogy. Analogies can only provide a partial similarity, and we can never find an absolutely correct analogy. Analogies are only useful when we want to present the characteristics of something in a simple and understandable way. In the rest of the article, we will explain why this analogy is flawed, and readers can read the remaining content with this question in mind.
When many people try to answer or think about this question, they tend to focus on the “three” in the three-way handshake, which is indeed important. However, if we reexamine the question, do we really understand what a “connection” is? Only when we know the definition of a “connection” can we attempt to answer why TCP requires a three-way handshake.
The reliability and flow control mechanisms described above require that TCPs initialize and maintain certain status information for each data stream. The combination of this information, including sockets, sequence numbers, and window sizes, is called a connection.
The RFC 793 — Transmission Control Protocol document clearly defines what a connection is in TCP. In summary, a connection is the information used to ensure reliability and flow control mechanisms, including sockets, sequence numbers, and window sizes.
Therefore, establishing a TCP connection means that the two parties involved in communication need to reach a consensus on the three types of information mentioned above. A pair of sockets in a connection is composed of an Internet address identifier and a port. The window size is mainly used for flow control, and the sequence number is used to track the sequence of data packets sent by the initiating party, allowing the receiving party to confirm the successful receipt of a particular data packet based on the sequence number.
At this point, we have transformed the original question into “Why do we need a three-way handshake to initialize sockets, window sizes, and initial sequence numbers?” Next, we will analyze and seek explanations for this refined question.
Design
This article will mainly discuss why we need a three-way handshake to initialize sockets, window sizes, initial sequence numbers, and establish a TCP connection from the following aspects:
- A three-way handshake is required to prevent the initialization of duplicate historical connections.
- A three-way handshake is required to initialize the initial sequence numbers of both communicating parties.
- Discuss the possibility of establishing a connection with a different number of handshakes.
Among these arguments, the first one is the primary reason why TCP chooses to use a three-way handshake. The other reasons are secondary in comparison. We discuss them here to provide a more comprehensive perspective and understand this interesting design decision from multiple angles.
Historical Connections
The RFC 793 — Transmission Control Protocol clearly points out the primary reason why TCP uses a three-way handshake: to prevent confusion caused by the initiation of old duplicate connections.
The principle reason for the three-way handshake is to prevent old duplicate connection initiations from causing confusion.
Imagine this scenario: if the number of communications between the two parties is only two, once the sender sends a connection establishment request, it cannot retract this request. In a complex or poor network condition, if the sender continuously sends multiple connection establishment requests and TCP establishes a connection with only two communications, the receiver can only choose to accept or reject the sender’s request. The receiver is not sure whether this request is an expired connection due to network congestion.
Therefore, TCP chooses to use a three-way handshake to establish a connection and introduces the RST
control message. When the receiver receives the request, it sends the sender’s SEQ+1
as part of the ACK
control message. At this point, the sender can determine whether the current connection is a historical connection:
- If the current connection is a historical connection, meaning the
SEQ
has expired or timed out, the sender will directly send anRST
control message to terminate this connection. - If the current connection is not a historical connection, the sender will send an
ACK
control message, and the two parties will successfully establish a connection.
By using a three-way handshake and the RST
control message, the ultimate control over whether to establish a connection is given to the sender. Only the sender has enough context to determine if the current connection is erroneous or expired. This is also the primary reason why TCP uses a three-way handshake to establish a connection.
Initial Sequence Numbers
Another important reason for using a three-way handshake is that both communicating parties need to obtain an initial sequence number for sending information. As a reliable transport layer protocol, TCP needs to build a reliable transport layer in an unstable network environment. The uncertainty of the network can lead to issues such as packet loss and out-of-order delivery. Common problems may include:
- Data packets being repeatedly sent by the sender, resulting in duplicate data.
- Data packets being lost during transmission due to routing or other network nodes.
- Data packets arriving at the receiver may not be in the order they were sent.
To address these potential issues, the TCP protocol requires the sender to include a “sequence number” field in the data packet. With the sequence number corresponding to each data packet, we can:
- The receiver can deduplicate repeated data packets based on the sequence number.
- The sender will resend the corresponding data packet until it is acknowledged.
- The receiver can reorder the data packets based on their sequence numbers.
Sequence numbers play a crucial role in TCP connections, and the initial sequence number, as part of a TCP connection, needs to be initialized during the three-way handshake. Since both parties in a TCP connection need to obtain the initial sequence number, they need to send a SYN
control message to each other, carrying their expected initial sequence number SEQ
. Upon receiving the SYN
message, the receiver will confirm it using the ACK
control message and SEQ+1
.
As shown in the above diagram, the two TCPs, A and B, send SYN
and ACK
control messages to each other. After both parties obtain their expected initial sequence numbers, they can start communication. Due to the design of the TCP message header, we can combine the two middle communications into one. TCP B can send both the ACK
and SYN
control messages to TCP A simultaneously, reducing the four communications to three.
A three-way handshake is necessary because sequence numbers are not tied to a global clock in the network, and TCPs may have different mechanisms for picking the ISN’s. The receiver of the first SYN has no way of knowing whether the segment was an old delayed one or not unless it remembers the last sequence number used on the connection (which is not always possible), and so it must ask the sender to verify this SYN. The three-way handshake and the advantages of a clock-driven scheme are discussed in [3].
Furthermore, as a distributed system, the network does not have a global clock for counting. TCP can initialize sequence numbers using different mechanisms. As the receiver of a TCP connection, we cannot determine if the initial sequence number received from the other party is expired. Therefore, we need the other party to make this determination. It is not practical for the receiver to save and verify the sequence numbers, which reinforces the point we made in the previous section — avoiding the initialization of historical wrong connections.
Number of Communications
When discussing the number of communications required to establish a TCP connection, we often focus on why it takes three communications instead of two or four. Discussing using more communications to establish a connection is often meaningless because we can always “exchange the same information using more communications.” Therefore, it is technically possible to establish a connection using four, five, or even more communications.
The issue of increasing the number of communications in a TCP connection often does not require discussion. What we pursue is actually completing the information exchange with the fewest number of communications (the theoretical minimum). This is why we repeatedly emphasize in the previous sections that using a “two-way handshake” cannot establish a TCP connection, and using a three-way handshake is the minimum number of communications required to establish a connection.
Conclusion
In this article, we discussed why TCP requires a three-way handshake to establish a connection. Before analyzing this question in detail, we first reconsidered what a TCP connection is. The RFC 793 — Transmission Control Protocol — IETF Tools provides a clear definition of a TCP connection — the data used for ensuring reliability and flow control mechanisms, including sockets, sequence numbers, and window sizes.
The three-way handshake in TCP can effectively prevent the initiation of erroneous historical connections and reduce unnecessary resource consumption for both communicating parties. The three-way handshake helps both parties obtain the initial sequence numbers, ensuring that data packets are transmitted without duplication or loss and maintaining their order. At this point, it is clear why “two-way handshake” and “four-way handshake” are not used:
- “Two-way handshake”: It cannot prevent the initialization of erroneous historical connections and wastes resources for the receiver.
- “Four-way handshake”: The design of the TCP protocol allows us to simultaneously transmit both the
ACK
andSYN
control messages, reducing the number of communications. Therefore, there is no need to use more communications to transmit the same information.
Returning to the question raised at the beginning of the article, why is using an analogy to explain TCP’s three-way handshake incorrect? This is mainly because the analogy does not clearly explain the core issue — avoiding the initialization of historical duplicate connections.