Bandwidth, Latency, and the “Size of your Pipe”

Posted: December 28, 2011 at 6:12 pm

I talk about performance a lot and more often than not people say something like "I have a really good Internet connection, so websites are fast for me." What is a little unsettling was when a company told me "Our datacenter has a really fat pipe, so our website is fast." My reply, delivered with a bit of a smile , is "The size of your pipe doesn’t guarantee a satisfying user experience."

These conversations show that there is a fundamental misunderstanding about how data is transferred on the Internet. Technical and non-technical people are confusing or combining the separate concepts of bandwidth and latency. Understanding these 2 concepts is critical to understand the core ideas behind front-end web performance and optimization techniques.

If you think of the Internet as a series of tubes, latency is the length of the tube between two points. Bandwidth is how wide the tube is. Indeed, it’s named bandwidth because it describes the width of the communications band. The wider the tube the more data you can send in parallel. The key point here that gets missed is that, regardless of how much data you are sending, you still have to move it the distance from point A to point B. That takes time and that is the latency.

To better illustrate how latency is not affected by bandwidth, I will use an example from Stuart Cheshire’s excellent 1996 essay, "It’s the Latency, Stupid." The physical distance from Boston, Massachusetts to Stanford University in California is 4320 kilometers. The speed of light traveling through a fiber optic data cable is 200,000 km/s. (compared to 300,000 km/s in a vacuum). So the travel time for a single photon of light to travel across a direct fiber optic connection from Boston to Stanford is 4320 km / 200,000 km/s = 21.6 milliseconds. The round-trip time to Boston and back is 43.2 milliseconds. These are fundamental laws of nature. You will never get something to travel faster than that. There will always be a delay of at least 43.2 ms when Boston communicates with Stanford.

In practice, however, the latency delay is more than 43.2 ms. This is because we do not have a single continuous piece of fiber optic cable connecting Boston and Stanford. Instead, the path goes through several segments along the way where dozens of pieces of networking equipment add to the delay. Regardless of how big and powerful that Cisco router is, it is not functioning at the speed of light! Typically, the Boston to Stanford route is on the order of 75-85 milliseconds.

Remember, we have yet to talk about bandwidth, or connection speeds. You need to wrap your head around the fact that to send a single bit of data you will always have a delay caused by latency for the data to travel the physical distance to its destination. Having a 25 Mbps connection does not somehow allow a single bit data of data to travel that a distance any faster. A large bandwidth connection simply allows you to send or receive more data in parallel. The data still needs to travel to and from your computer. The diagram below illustrates this concept:

Here we see two connections: one which is low bandwidth and one which is high bandwidth. When transmitting a JPEG across both connections the latency for the first byte of data to travel from the source to the destination is the same. The high bandwidth connection downloads the file faster than the low bandwidth connection because more data can travel in parallel. So faster transmission, latency still there.

In the late 1990′s and early 2000′s, the impact of latency was less visible due to the fact that personal internet connections were quite slow compared to today. The latency was masked because the delay in sending a request and waiting for a response was much smaller than the total time it took to download all of the response. However, that is no longer the case. It is not uncommon for a browser requesting a small image to wait 100 – 150 milliseconds before spending 5 milliseconds to download the image contents. This means that latency is accounting for 90-95% of the total time to request and download the resource. This is tremendously inefficient!

So why should a front-end performance advocate care about latency? Because it’s the basis for many of your performance optimizations! In fact, the entire class of "reduce the number of HTTP requests" optimizations are all about avoiding latency.

Imagine downloading ten, 10 kilobyte files. The total download time would be 10a + 10b where a is the delay due to latency and b is the amount of time it takes to download one 10 kilobyte file. Now consider downloading a single 100 kilobyte file. The total download time would be a + 10b. A is the overhead of a single request and response while 10b is amount of time spent downloading the 100 kilobyte file. In this example, the time difference between downloading ten, 10 kilobyte files or a single 100 kilobyte file is 9a. If we were transmitting these files using HTTP between Stanford and Boston, that would be 85 ms * 9 = 765 milliseconds, or over 3/4 of a second saved! And that’s a best case scenario. We haven’t even started talking about congestion, transmission errors, QoS throttling, taking separate paths, or any of the other things that introduce even more latency. And we haven’t even begun to talk about CDN’s and latency. I’ll save that for another post.

Remember, latency is the amount time required to travel the path from one location to another. Bandwidth is how much data can be moved in parallel along that path

Performance Calendar 2011: Advice on Trusting Advice

Posted: December 24, 2011 at 3:01 pm

It’s that time of year again for Stoyan’s excellent Performance advent calendar. I wrote a post for it which was published today entitled: Advice on Trusting Advice. Mathias Bynens summed up the post as “Always examine a third party code snippet before including it in your site, regardless of who wrote it” but it’s bigger than that. It’s not just snippets from competent people can be slow, or that Google can give you bad advice, or even that Zoompf can give you bad advice for that matter. The broader theme is that all aspects of performance must be transparent so they can be discussed, examined, independently verified, and improved. The reason I could write that post about a problem a customer found with a Google+ snippet is that Google was open and transparent about what their snippet did and provided guidance, though incorrect, on how to use the snippet. This allowed me to discuss what it did, show the problems, make recommendations, hopefully change Google’s advice, and improve things for everyone. But only because of transparency.

Sadly I’m seeing some trends where people and companies are making web performance opaque. I began talking about transparency and performance responsibility in my Amazon Silk and Performance Responsibility post. These are topics I plan to speak of a lot in the new year.

REDbot: Awesome HTTP Testing

Posted: December 22, 2011 at 8:43 pm

I am always on the lookout for new and cool web performance and quality tools. One of my favorite tools is REDbot. Every web performance advocate should be using REDbot regularly. Want to know why?

To start, REDbot was created by the awesome Australian Mark Nottingham. Mark writes some excellent technical-yet-easy-to-understand essays on the inner workings of HTTP, such as the definitive Caching tutorial for Web authors and web masters. With REDbot, Mark has taken his vast knowledge of all things HTTP and distilled that into a wonderful tool written in Python. So, Mark’s brain is Reason #1

Reason #2 is what the tool does. The best way to describe REDbot might be "HTTP Lint", which, funny enough, was the name of the first C# project of what became the Zoompf scanner. REDbot examines a server’s HTTP response headers and body for performance issues, quality issues, compatibility issues, adherence to the HTTP RFCs, and provides various ancillary info messages. Frankly the depth of issues it looks for is really quite amazing; currently over 150 different items can be detected and reported by REDbot.

screen of some of the issues that REDbot finds

As an example, here are just some the things that REDbot checks for with the Last-Modified header.

  • Is the date format valid? Invalid dates can’t get conditionally cached.
  • Is the date in the future? Resources in the future cannot be cached.
  • Does the web server correctly return a 304 if the resource has not been modified?
  • Are there duplicate Last-Modified headers? Do they have different values?

That is just scratching the surface of what REDbot can find out about your site. Is the Content-Length header right? Is chunked encoding working properly? How many inline caching proxies did the response go through? REDbot has helped me find and fix several issues with Zoompf’s own web infrastructure. In fact, many of the issues REDbot looks for were so helpful, we added them to list of issues that Zoompf tests for. While Zoompf does not include all of REDbot’s tests I don’t know of any other performance tools which look for these kinds of HTTP issues. For the reason of completeness alone, REDbot needs to be part of you toolset.

Of course, detecting some HTTP problems can get pretty involved. For example, REDbot and Zoompf will verify that a server properly responds to If-Not-Modified, If-None-Match, and Range requests. Additional REDbot and Zoompf can confirm that Vary and Accept-Encoding response headers are correctly operating. All of this involves sending multiple requests to the web server for each issue. Testing a single static resource can involve 5 to 6 requests and processing that many responses! While this isn’t so bad when testing a single URL, doing for multiple resources is time consuming. The public web instance of REDbot often times out when using the "check assets" feature to test multiple URLs at once. Zoompf website scans can take 2 times or 4 times longer to complete if you conduct these extended HTTP tests. We are playing with a few ways to be intelligent about when we send these extra test requests but it is still a work in progress. In the meantime, this extended HTTP testing capability is disabled by default for our customers and completely unavailable for free scans.

Reason #3: REDbot is open source and hosted on GitHub which makes it super easy to start using. It can run from the command line, but Mark “the Awesome Aussie” Nottingham (it’s his pro wrestling name, look it up) has setup a public instance of REDbot with a web interface where anyone can quickly test a resource. If you test an HTML page, you can click the "check assets" link underneath the response headers to recursively test all the referenced resources. This is a handy feature to rapidly test multiple URLs but as we said you will occasionally get timeouts.

Reason #4: It’s web UI is gorgeous. As someone who makes incredibly crappy web interfaces (and which I am convinced would somehow be better if I wrote them on a new Macbook Air), I drool over what Mark has done. Fades, transparency, context dialogs, this thing is sexy looking.

REDbot is an awesome tool which provides much needed HTTP insight and validation available nowhere else. I highly recommend it to anyone interested in the working of the web and I thank Mark Nottingham for his excellent contribution to our community.

Amazon Silk and Performance Responsibility

Posted: December 16, 2011 at 7:49 pm

Now that Amazon’s Kindle Fire tablet is making its way into customers hands we are starting to hear about how its much touted Silk web browser and its web acceleration feature functions in real life. Initial research shows that, at least in this 1.0 release, Silk is actually faster with acceleration turned off.

I don’t want to focus on how good or bad Silk’s acceleration feature is. Technology only gets better. I agree with Steve Souders that Silk will eventually provide a faster browsing experience. Making Silk’s acceleration work is a technical issue which Amazon can solve. Instead I want to focus on the larger policy issue: Who is responsible for web performance?

With Silk, Amazon is telling web developers that they suck at making fast sites. In fact, they suck so much, that Amazon is not going to wait for them to stop sucking. Instead Amazon has taken the unprecedented step to proxy all of the web traffic of Silk users and apply performance optimizations on the fly. Its mod_pagespeed, but in reverse.

The traditional view is that it’s the website owners’ responsibility to make a website faster. But Amazon is saying “its the creator of the consumption device’s responsibility to make a website faster.

That’s pretty startling shift in view.

A bunny insulting you

Amazon has made a business decision that the Kindle Fire will provide the fastest tablet browsing experience possible. (Sound familiar?) To ensure this happens, Amazon developed its own web browser and is spending millions of dollars on the infrastructure to support this acceleration feature. Web server does not support SPDY? Solved. Website not using compression? Solved. Content not minifed? Solved. But only if you are using a Kindle Fire.

In other words, web performance is so poor that hardware vendors are able differentiate themselves by solving your website’s performance problems.

It’s not surprising that Amazon decided to take matters into its own hands I have written extensively and presented Zoompf’s research at Velocity numerous times on the matter. 70% of images aren’t losslessly optimized. 75% of sites have an item which should have HTTP compression which doesn’t. 30% of webpages contain errors. Even Twitter is choking on implementing some of the most basic of optimizations. It’s not pretty out there.

But is it really a 3rd party, be it a online retailer, a consumer electronics company, a network device manufacturer, or a software vendor, who magically absolves you of the responsibility to make a fast site? This is an interesting question.

I am of the opinion that it is the web app creator’s responsibility to make it fast. In fact, the entire philosophy of Zoompf is, ultimately, web performance transparency: Scan my site; tell me specifically what the performance problems are; tell me how severe they are and what their impact is; show me how to fix them with configuration, process, and code changes; validate they get fixed; monitor and track that they are not re-introduced. No magic. No obfuscation. No “trust us, performance is hard, we got this, pay no attention to the man behind the curtain.” Just data, and the tools and guidance to make informed decisions.

Who is responsible for web performance? Who do you think? This question that will play out over the next several years and I’ll write more of my thoughts soon.

The Many Faces of Web Performance Tools

Posted: December 1, 2011 at 4:52 pm

“Oh, web performance! So your company tests how much load my web server can handle, right?”

It is all too common that when I am talking with someone about web performance tools and we need to spend a few minutes just to understand each others vocabulary. The term “web performance” is commonly used for many types of products. Further complicating the situation is that these tools differ vastly in terms of what type of information they supply, the primary user, and how they help an organization improve overall web performance. If you don’t understand the different categories of performance tools and what business problems they address you can make the costly mistake of purchasing the wrong product and not addressing the root of the issue you are trying to fix.

I recently saw an example of this while speaking with a Vice President of Development for a large US communications company with hundreds of web properties. Web analytics revealed that users were abandoning their websites. Through follow up surveys with these users, the team discovered that the drop off was directly related to the performance of their web properties. The websites were slow and they were losing visible revenue opportunities. The Vice President was shocked. “But we have web performance tools! How can our web performance be bad?”

The reason they were simply trying to solve a business problem with the wrong type of tool. They had purchased a performance monitoring tool which measured the performance of their web properties from data centers around the world. Performance monitoring tools are certainly important because they help determine when a website is responding slower than a user-defined threshold. This is a good solution to the business problem of ensuring your application’s performance does not degrade once in production.

However, the communications company really needed a different business problem solved. They wanted to know why their web properties were slow but they didn’t fully understand what business problems a performance monitoring tool would address. To them, it was a “performance tool” so it should help them have a faster site. When it didn’t, they were frustrated, had spent tens of thousands of dollars on the problem, and still didn’t have a faster website.

To help provide some education on this topic, we at Zoompf created a whitepaper titled “A Guide to Modern Web Performance Tools.” In this whitepaper we examine the different categories performance tools to help you evaluate whether you are using the right products for your organization. We start by defining three categories of web performance products: load testing tools, performance monitoring tools, and web performance optimizations tools. We explain what they do, how they help an organization achieve overall web performance goals, and provide recommendations about how to deploy these tools to achieve maximum return on your performance investments.

The whitepaper is of course free. Please feel free to download it here to find out more on this topic and as always, please do send me your comments or suggestions.