SNI to ESNI to ECH- The journey of the encrypted handshake

Dhruv Rauthan
4 min readMay 12, 2021

The Internet was not built with security in mind. It was, and is, meant to be accessible to any and everyone around the world. As this becomes a reality, dangerous entities intend to exploit the vulnerabilities left behind in the design. The past decades have been spent trying to patch up the holes with the introduction of protocols such as HTTPS, TLS etc. Most communication on the modern Internet is encrypted through the use of special keys. These keys are exchanged through a cryptographic handshake protocol called the Transport Layer Security (TLS) handshake.

SNI- Server Name Indication

In the early days of TLS in 2002, SNI was added as an extension. This allowed a server to host multiple domains under a single IP address and simultaneously solved the issue of the depleting number of available public IP addresses in IPv4 (which was eventually resolved via IPv6).

Suppose you mail a letter from Alice to Bob. Now, the mailman delivers the letter to Bob’s office but only the receptionist, who receives the letter, knows who Bob is, where he works and passes along the letter to him.

Here Alice is the client, the mailman is the protocol used, the receptionist is the server and Bob is the website. When the server receives the packet, it forwards the request to the destination website by observing the server_name (SNI) extension in the packet.

Credit: ssl2buy.com

The server_name is a field in the TLS packet sent during the ClientHello. This contains the website name that the client is trying to connect to. Now, when this packet reaches the server, it looks at the SNI field and returns the appropriate SSL certificate of the website the client wants to visit. The client looks at the certificate, verifies its authenticity and the connection proceeds as normal.

You can see how this helps the complication of depleting IP addresses. Now, the client can just send the packet with the SNI extension to the IP address, and the server looks at this field and responds with that website’s certificate.

ESNI- Encrypted Server Name Indication

Now, the SNI field is completely unencrypted, which may allow attackers to observe the destination the client wants to access. Further, certain organizations’ firewalls block websites by observing this field, which may restrict the user’s access to the open internet. Thus, hiding this field became a priority.

How do we encrypt SNI? This problem was initially seen as a contradiction, since to decide on a session key, the client and the server communicate through the unencrypted handshake and we can’t encrypt communication without a key. So how do we decide on a key? This was solved using DNS where the DNS server stores the public key of the respective domain name and returns it whenever the client requests the IP address of a particular URL. The client encrypts the SNI field using the public key and the server can decrypt the SNI using its own private key.

ECH- Encrypted Client Hello

Why do we need to encrypt the entire handshake? The TLS handshake contains information such as SNI, ALPN- Application Layer Protocol Negotiation (which decides the application layer protocol to be used in further communication), the cipher suites which the client supports etc. You might think that this does not reveal what exactly the client and the server are communicating, but attackers can do a lot with this information by identifying which domain the client is visiting and what protocol they intend to use, which might be used by them to poison/intercept communication.

Now you might ask, why wasn’t this feature the norm, instead of just encrypting 1 field, i.e, SNI. Well, earlier a method was proposed, called 0-RTT (0 Round Trip Time) in which the server provided the client with a key in its very first handshake, and all further handshakes were encrypted using this key. However, this wasn’t made a reality since the first handshake was still in fact, unencrypted.

How do we encrypt the handshake? Well, the answer is simple. In the same way we encrypted SNI, by using DNS.

Credit: blog.cloudflare.com

But it isn’t that simple. What if the server is unable to decrypt the ClientHello, maybe due to the client using an old stale key in its cache? Here, a brilliant idea was used. Using 2 ClientHello messages- ClientHelloInner and ClientHelloOuter. Now ClientHelloInner contains all the actual sensitive information regarding the ‘backend’ origin server with which the client actually wants to connect, whereas ClientHelloOuter contains the data pertaining to the ‘client-facing’ server which belongs to the ECH service provider. In case the decryption fails, the handshake is completed using the ClientHelloOuter which is unencrypted of course. The client-facing server returns back the correct public key with which the client can retry the handshake. If the connection is completed with the ClientHelloOuter, the client immediately aborts the connection and retries the handshake with the new key.

ECH solves a lot of problems regarding privacy and anonymity on the internet. This combined with encrypted DNS (using DNS over HTTPS) can make sure that a user can visit a website without anyone intruding and spying on the data flow between the 2 machines, and bring the dream of a truly free internet closer to reality.

--

--