As we all know, HTTPS can solve the security issues in HTTP plaintext transmission, especially the problem of man-in-the-middle attacks. Its original full name is HTTP over SSL (or HTTP Security). SSL stands for Secure Sockets Layer, which was later replaced by TLS (Transport Layer Security). Today, let’s summarize the key points of HTTPS.
This article is first published in the medium MPP plan. If you are a medium user, please follow me in medium. Thank you very much.
HTTPS Versions
People generally refer to the SSL and TLS protocols as the SSL/TLS protocol, but when people mention SSL in daily conversations, they usually mean the TLS protocol.
TLS protocol has versions 1.1, 1.2, and 1.3. Among them, 1.2 used to be the mainstream, but now it is recommended to use the improved TLS 1.3, which upgrades the Handshake and Record protocols to make communication more secure and efficient.
In terms of security, TLS 1.3 removes some encryption algorithms that were considered insecure in TLS 1.2, such as RC4, DES, 3DES, AES-CBC, and MD5, which reduces the risk of security vulnerabilities.
In terms of performance, TLS 1.3 reduces the number of round trips (RTT) during the handshake process, thereby speeding up the connection establishment. In the best case scenario, TLS 1.3 only requires one round trip to complete the handshake, and it also supports 0-RTT extension, while TLS 1.2 requires two or more.
Of course, as a well-designed Internet protocol, TLS 1.3 also considers maximizing forward compatibility through the extension protocol of the hello handshake message, which is not elaborated here.
Core Process of HTTPS
Depending on the differences in different versions, the detailed processes may vary slightly. Without pursuing rigorous details, the working process of HTTPS is as follows.
This diagram by bytebytego is very expressive, showing the key interactions and core encryption processes. The most crucial steps are how to establish a TCP connection, how to negotiate symmetric encryption keys through asymmetric encryption, and finally communicate through symmetric encryption.
HTTPS, more precisely TLS, is well-designed. The key components are the Record Layer and several protocols. The former is the data transport channel, and various sub-protocols run on it. The Record is the basic unit of data transmission in TLS, similar to TCP segments and IP packets, which is the meaning of the following diagram.
The most important protocol in the Protocol is the Handshake protocol. After capturing the Client Hello, it will be more clearly reflected in Wireshark.
HTTPS SNI Extension
In the early days of the Internet, single-server machines were not that powerful, and the accompanying HTTPS, such as SSL v2, also had design flaws. At that time, there was an assumption that a single-server with an IP would only host one domain service. Therefore, after DNS resolution, it was very certain to directly connect to the IP and use the specific certificate for a particular domain. However, with the explosion of cloud computing, virtual hosting, and the scarcity of IP addresses in IPv4, it is inevitable that a server will host multiple domain scenes. This poses a problem for servers in not knowing which domain’s SSL certificate the client wants to access, which led to the emergence of HTTPS SNI.
SNI (Server Name Indication) is an extension of the TLS protocol, which allows the client to send the target hostname information to the server during the handshake process. This way, the server can host multiple domains’ HTTPS services on the same IP address and provide the correct certificate for each domain.
This problem seems simple, but in the early stages of the widespread adoption of HTTPS and the move of various Internet service providers to full-site HTTPS, many CDN vendors did not support SNI. Of course, today in 2024, both software ecosystems like Nginx and various vendors already support it.
SNI information is transmitted through the TLS handshake protocol. The packet capture diagram is roughly as follows.
In practice, you can use the -servername
option in the openssl s_client
subcommand to specify SNI:
|
|
If you use the OpenSSL Library, you can also use functions like SSL_set_tlsext_host_name
and BIO_set_conn_hostname
to set SNI in the code.
HTTPS Certificate Mechanism
HTTPS achieves a series of encryption, decryption, signing, verification, and other functions through the public key system’s asymmetric, symmetric, and hash algorithms, basically realizing the four security properties: confidentiality, integrity, authentication, and non-repudiation. It also provides solutions for typical man-in-the-middle attacks (MITM).
To solve the trust issue of public keys, the certificate and trust chain mechanism are introduced. A certificate is issued by a third-party Certificate Authority (CA). It is essentially a file, usually stored with extensions like .crt, .cer, or .pem. This file is encoded according to certain standards, such as X.509, and contains information such as the public key, certificate holder information, issuing authority information, validity period, and digital signature.
There are some well-known CA organizations, such as DigiCert, VeriSign, Entrust, Let’s Encrypt, etc. The certificates they issue are divided into DV, OV, and EV, corresponding to different levels of trust. However, CA itself also has trust issues. The trust of small CAs relies on the signature and authentication of large CAs, but when it reaches the end of the chain, it can only use “self-signed certificates” or “root certificates”.
Most operating systems and browsers have built-in root certificates for major CAs, and during HTTPS communication, the certificate chain is verified layer by layer until the root certificate.
HTTPS Software Ecosystem
Although the HTTPS or TLS ecosystem is rich, OpenSSL dominates the field. It supports almost all publicly available encryption algorithms and protocols and has become the de facto standard. Many applications use it as the underlying library to implement TLS functionality, such as the famous Apache, Nginx, etc.
OpenSSL originated from SSLeay and has branched out into many branches, such as Google’s BoringSSL and OpenBSD’s LibreSSL. OpenSSL’s content is also extremely comprehensive, and learning can be prioritized using the openssl
command. For specific details, you can refer to ChatGPT.
HTTPS Acceleration Solutions
HTTPS is great, but great things come at a cost. Therefore, various optimizations for full-site HTTPS deployment can basically be written as a separate article. Here are some brief points.
First is optimizing RTT, which is particularly important in IO-intensive Internet scenarios. It mainly involves protocol upgrades, such as upgrading to HTTP/3 and TLS 1.3, which optimize RTT through different principles. Second is optimizing single-step performance, such as adding TLS acceleration cards, setting up dedicated TLS clusters or modules, and paying attention to terms like TLS session resumption.
I have written an article before, sharing why HTTPS is so slow. If you are interested, you can read it here: Why does HTTPS need 7 handshakes and 9 times delay?
References
What’s the difference between HTTP and HTTPS?
how-does-https-work