Explaining the CRIME weakness in SPDY and SSL
There is an interesting new security weakness discovered in SPDY and SSL/TLS that allows attackers to decrypt the session cookies for other websites. This weakness, known as CRIME, was discovered by Juliano Rizzo and Thai Duong. They will present their full findings at the Ekoparty Security Conference in Buenos Aires later this month.
Since SPDY has some great potential to speed up websites, this blog post will explain the CRIME weakness so that the frontend performance community can understand the issue. Rizzo and Duong are working with browser and web server vendors to resolve the weakness and this is not something the majority of people, let alone performance lovers, need to worry about. However the CRIME weakness demonstrates how performance optimizations can often directly lead to security problems. For this reason, I believe it is critical for the frontend performance community to understand how CRIME works and use it as a lesson about what can happen when applying performance optimizations.
The CRIME weakness exploits how data compression and data encryption interact to discover information about the underlying encrypted data. Doing these over and over again allows an attacker to eventually decrypt the data, and recover HTTP cookies like session cookies. How fast is “eventually?” In the CRIME weakness demo video Rizzo and Duong recovered encrypted cookies for Stripe in only a few minutes.
Why is this bad? Well, session cookies are like the Golden ticket from Charlie and the Chocolate Factory. Once an attacker has one they can impersonate you and essentially hijack your account. If you want to learn more read up on session hijacking and tools like Firesheep.
Compression at the Transport Layer
As we know from our Lose the Wait series, a big improvement to web performance can come from simply reducing the amount of content that gets sent to the client. HTTP only allows us to compress the response body. There is no mechanism to compress HTTP request or response headers, and no mechanism to compress an HTTP request which has a body.
SSL/TLS has a feature that almost no one knows about: SSL can compress the data sent between the client and the web server. It’s so rare that, despite having worked exclusive on the web protocols for most of a decade, I had never heard about it. I only ran across it a few months ago while writing the SSL handshake code for the Zoompf scanner to detect if a server supported SPDY or not.
Compressing at the transport layer can be good and can be bad. On the plus side, if you have not properly configured HTTP compression (something that is surprisingly tricky to do properly), SSL/TLS’s compression will compress the content for you. As an added bonus, SSL/TLS will compress everything, including those pesky HTTP headers. A downside is that the transport layer SSL protocol has no context about what type of data the application layer HTTP protocol is transmitting. That’s the whole point of the layered OSI networking model: layers are agonistic about the data that they transmit. From SSL/TLS’s perspective there is a stream of bytes, and, if configured to do so, SSL/TLS can compress those bytes before encrypting them. Even if those bytes are an incompressible blob of data like a JPEG image, or an already HTTP compressed CSS file, or an uncompressed HTML document. SSL/TLS does not care, it just compresses.
SPDY is a little smarter in this regard: it actually sits between HTTP and SSL and has the context to know that to compress. SPDY knows to always compress the HTTP headers, regardless of the body.
The key point here is that both SSL/TLS’s “blindly compress anything if enabled” and SPDY’s intelligent compression will compress the HTTP headers before encrypting them and sending them between the client and server.
How Compression and Encryption Interact
To understand the CRIME weakness, we need to understand how lossless data compression works. Fundamentally, lossless data compression works by finding redundancies in a body of data, and representing those redundancies in a smaller fashion. Let’s do a simple example.
AAAAABCDEFGH = 5ABCDEFGH
Here I’ve used run length encoding, a very simple compression scheme. I replaced the redundant sequence
"5A" and achieved a 25% compression ratio. (Don’t get caught up too much in RLE or how it works. I’m simplifying this so we can quickly get to the larger issue and avoid getting into a technical analysis of lossless data compression algorithms). SPDY and SSL/TLS don’t use RLE and instead use more complex lossless compression algorithms like DEFLATE. However, that doesn’t matter for this discussion. Whether using RLE or DEFLATE, the principles of lossless data compression and its effects on encryption are exactly the same.
Encrypted data has no redundancies. Encrypted output should be uniformly random. Patterns in encrypted output give information about how the encryption was performed and thus provides insight into what the input data was which was encrypted. So, for compression plus encryption to work, you must first compress data, then encrypt it.
Let’s see how this works. We will take two strings, use our simple RLE compression scheme, and encrypt the output.
Source Compressed Encrypted --------------------------------------------------- ABCDEFGHIJKL = ABCDEFGHIJKL = Z@%fkT2r$#!B AAAAABCDEFGH = 5ABCDEFGH = jhG*4m,$A
We see that the first string,
ABCDEFGHIJKL, could not be compressed using RLE, because there was no redundancy. The other string,
AAAAABCDEFGH, could be compressed, making the input to the encryption step smaller. This in turn means the encrypted output is smaller. And that is the interesting part! If we know that compression happens before encryption, and encrypted output 1 is shorter than encrypted output 2, then we know encrypted output 2 had more redundancy than encrypted output 1. But how do we leverage this into an attack?
Breaking Encryption with Compression
Consider this situtation:
Source Compressed Encrypted ------------------------------------------------- XYZABCDEFGHIJK = XYZABCDEFGHIJK = At9XeCNVxKt@XZC
Here, an attacker’s input is added to some source data, which then compressed, and then encrypted. The attacker can’t see this source data, or what it looks like compressed. All the attacker knows is the data he supplied, and the encrypted output. So, to the attacker, things actually look like this:
Source Compressed Encrypted ------------------------------------------------------ XYZ[Unknown Data] = [Totally Unknown] = At9XeCNVxKt@XZC
The attacker wants to know the contents of that Unknown data. The “compression before encryption” provides a way to do this, based on the information the attacker gets to supply. Consider what happens when the attacker tries three different input strings:
Source Compressed Encrypted ------------------------------------------------------ ZZZ[Unknown Data] = [Totally Unknown] = QvnQSHvQWB3*QR YYY[Unknown Data] = [Totally Unknown] = f*fB&M7sya*u7F AAA[Unknown Data] = [Totally Unknown] = rAW^26uffH%8
"AAA" as the input from the attacker adds more redundancy to data to that will be compressed, which compresses the data better, which makes the encrypted output smaller. This tells the attacker that there is an
"A" in the original unknown data! Success! An attacker can repeat this, over and over, using different input, and determine the contents of the unknown data.
This is how the CRIME weakness works. An attacker adds data to some content containing, among other things, HTTP cookies with session information. That data gets compressed, and then encrypted. The attacker sees that encrypted data, and, by doing this over and over again and changing the attacker control input, gains insight into the redundancy that gets compressed and thereby learns the content of the HTTP cookies.
Speeding up the attack
The attacker can now recover the data, but this could be slow. After all, they have to try, one character at a time, to find redundancies and deduce what the all characters are. However, the “Unknown Data” section is not all that unknown. In fact its quite well defined as we shall see. This allows the attacker to use what’s called a partial known plaintext attack, since the attacker has a pretty good idea of what some of the plaintext looks like. Partial known plaintext attacks are widely used to break cryptographic systems. In fact, partial known plaintext attacks, specifically the “fist” of different radio operators, were used by the Allies in World War II to help break messages using the Enigma encryption system.
So what do we know about the Unknown data? We know that it consists of HTTP headers! Based on the information supplied by Rizzo and Duong it is not clear if the attacker is looking at encrypted HTTP request headers send by the client or encrypted HTTP response headers from the server. Based on my experience I think CRIME is examining the request headers because that would always have the user’s session cookie. Regardless, it doesn’t matter whether CRIME is focused on request or response headers because the structure is the same. It’s a request or status line, followed by a series of lines, each of which end in
CRLF sequences, and the entire block ends with a double
CRLF sequence. Each line starts with an HTTP header, which is almost certainly from a list of only a few dozen possible strings like
The attacker knows even more. Browsers and servers don’t change the order of their HTTP headers with each request or response, so the attacker knows exactly the order and structure of the headers. For example, request headers for Browser A are always ordered
Accept-Language. The Apache web server used by a bank might return headers in the order
Content-Length. While header order may be different from browser to browser or server to server based on their version and configuration, the same browser or the same server will not alter the order of the headers between different requests or responses. In words, they will be consistent, even if they are different.
Additionally, the attacker knows most of the HTTP header values as well. For requests, values for headers like
Pragma, don’t change, and those headers also have only a relatively few possible values. The only HTTP request headers that could be change would be things like
If-Modified-Since. The same is true for response headers from the server. The
Content-Type header value for a response doesn’t change every time. Nor does the
Server header value. Nor does
Cache-Control. Nor does
Vary. Only a few headers, like
Cookie could change. An attacker could query the server directly to determine things like
Date and even cookie format.
The net result of all this knowledge reduces the amount of variability in the Unknown data. The attacker can use this information to better understand the redundancy that already exists in the data, allowing them to more quickly determine how their changes to the input interacts with the truly unknown pieces of data, like the session cookies.
Things Still Unknown
CONNECT and act as a dumb pipe pushing bytes. Hopefully the method of how the attacker is including their input will be revealed in their upcoming presentation.
SPDY and SSL/TLS can be used to compress content, including HTTP headers before they are encrypted for transit. In the CRIME attack, an attacker is able to include data in the source material before it is compressed and then encrypted. By choosing different input data and observing the length of the encrypted data that comes out, the attacker is able to learn about how their input affects data redundancy and ultimately recover information like HTTP session cookies.
If you are interested in things like SPDY and SSL/TLS compression, you would love Zoompf. Zoompf tests yours web application for nearly 400 performance issues. You can get a free performance scan of you website now and take a look at our Zoompf WPO product at Zoompf.com today!