Posts Tagged ‘browser’

Browser Wars II: Why I returned to Firefox

September 7th, 2008 by smp | Comments | Filed in Browsers, Software, Technology, The Web, Web Performance, Work

Since the release of Google Chrome on September 2, I have been using it as my day-to-day browser. Spending up to 80% of my computer time in a browser means that this was decision which affected a huge portion of my online experience.

I can say that I put Chrome through its paces, on a wide-variety of sites, from the simple to the extremely content-rich. From the mainstream, to the questionable.

This morning I migrated back to Firefox, albeit the latest Minefield/Firefox 3.1alpha.

The reasons listed below are mine. Switching back is a personal decision and everyone is likely to have their own reasons to do it, or to stay.

Advertising

I mentioned a few times during my initial use of Chrome that I was having to become used to the re-appearance of advertising in my browsing experience [here and here]. From their early release as extensions to Firefox, I have used AdBlock and AdBlock Plus to remove the annoyance and distraction of online ads from my browsing experience.

When I moved to Chrome, I had to accept that I would see ads. I mean, we were dealing with a browser distributed by one of the largest online advertising agencies. It could only be expected that they were not going to allow people to block ads out of the gate, if ever.

As the week progressed, I realized that I was finding the ads to be a distraction from my browsing experience. Ads impede my ability to find the information I need quickly.

Older Machines

My primary machine for online experiences at home is a Latitude D610. This is a 3-4 year-old laptop, with a single core. It is still far more computing power than most people actually need to enjoy the Web.

While cruising with Chrome, I found that Flash locked up the entire machine on a very regular basis. Made it unsuable. This doesn’t happen on my much more powerful Latitude D630, provided by my work. However, as I have a personal laptop, I am not going to use my work computer for my personal stuff, especially at home.

I cannot have a browser that locks up a machine when I simply close a tab. It appears that the vaunted QA division at Google overlooked the fact that people don’t all run the latest and greatest machines in the real world.

Auto-Complete

I am completely reliant on form auto-completes. Firefox has been doing this for me for a long time, and it is very handy to simply start typing and have Firefox say “Hey! This form element is called email. Here are some of the other things you have put into form elements called email.”

If you can build something as complex as the OmniBox, surely you can add form auto-completes.

The OmniBox

I hate it. I really do. I like having my search and addresses separate. I also like an address bar that remembers complete URLs (including those pesky parameters!), rather than simply the top-level domain name.

It is a cool idea, but it needs some refining, and some customer-satisfaction focus groups.

I Don’t Use Desktop-replacing Web Applications

I do almost all of my real work in desktop-installed Web applications. I have not made the migration to Web applications. I may in the future. But until then, I do not need a completely clean browsing experience. I mentioned that the battle between Chrome and Firefox will come down to the Container v. the Desktop - a web application container, or a desktop-replacing Web experience application.

In the last 48 hours, I have fallen back into the Web-desktop camp.

Summary

In the future, I will continue to use Chrome to see how newer builds advance, and how it evolves as more people begin dictating the features that should be available to it.

For my personal use, Chrome takes away too much from, and injects too much noise into, my daily Web experience to continue to use as the default browser. To quote more than a few skeptics of Chrome when it was relased - “It’s just another browser”.

Tags: , , , , , , , , , , , , , , ,

Joost: A change to the program

September 5th, 2008 by smp | Comments | Filed in Software, Streaming, Technology, The Web

In April 2007, I tried out the Joost desktop client.  [More on Joost here and here]

I was underwhlemed by the performance, and the fact that the application completely maxxed out my dual core CPU, my 2G of RAM, and my high-speed home broadband. I do remember thinking at the time that it seemed weird to have a Desktop Client in the first place. Well, as Om Malik reports this morning, it seems that I was not alone.

After this week’s hoopla over Chrome, moving in the direction of the browser seems like a wise thing to do. But I definitely hear far more buzz over Hulu than I do for Joost on the intertubes.

Update

Michael Arrington and TechCrunch weigh into the discussion.

Tags: , , , , , , , , ,

Chrome v. Firefox - The Container and The Desktop

September 4th, 2008 by smp | Comments | Filed in Browsers, Technology, The Web, Web Performance, Work

The last two days of using Chrome have had me thinking about the purpose of the Web browser in today’s world. I’ve talked about how Chrome and Firefox have changed how we see browsers, treating them as interactive windows into our daily life, rather than the uncontrolled end of an information firehose.

These applications, that on the surface seem to serve the same purpose, have taken very different paths to this point. Much has been made about Firefox growing out of the ashes of Netscape, while Chrome is the Web re-imagined.

It’s not just that.

Firefox, through the use of extensions and helper applications, has grown to become a Desktop replacement. Back when Windows for Workgroups was the primary end-user OS (and it wasn’t even an OS), Norton Desktop arrived to provide all of the tools that didn’t ship with the OS. It extended and improved on what was there, and made WFW a better place.

Firefox serves that purpose in the browser world. With its massive collections of extensions, it adds the ability to customize and modify the Web workspace. These extensions even allow the incoming content to be modified and reformatted in unique ways to suit the preferences of each individual. These features allowed the person using Firefox to feel in control, empowered.

You look at the Firefox installs of the tech elite, and no two installed versions will be configured in the same way. Firefox extends the browser into an aggregator of Web data and information customization.

But it does it at the Desktop.

Chrome is a simple container. There is (currently) no way to customize the look and feel, extend the capabilities, or modify the incoming or outgoing content. It is a simple shell designed to perform two key functions: search for content and interact with Web applications.

There are, of course, the hidden geeky functions that they have built into the app. But those don’t change what it’s core function is: request, receive, and render Web pages as quickly and efficiently as possible. Unlike Firefox’s approach, which places the app being the center of the Web, Chrome places the Web at the center of the Web.

There is no right or wrong approach. As with all things in this complicated world we are in, it depends. It depends on what you are trying to accomplish and how you want to get there.

The conflict that I see appearing over the next few months is not between IE and Firefox and Safari and Opera and Chrome. It is a conflict over what the people want from an application that they use all the time. Do they want a Web desktop or a Web container?

Tags: , , , , , , , , , , , , , , , , ,

Google Chrome: First Impressions

September 2nd, 2008 by smp | Comments | Filed in Software, Technology, Web Performance, WebPerformance.Org

Google Chrome is out. And from first impressions, it is stinking fast. However, i do have some gripes.

  1. Comes with link underlining enabled. I hate this. It’s the first think I disable in Firefox and any browser that supports disabling underlining
  2. Where’s the “get your hands dirty under the hood” option list? I love the Firefox about:config list. Chrome needs this.
  3. Ads. I know. There is little chance for built in ad-blocking, but it’s on my wish-list.

Otherwise, it’s good…so far. And the memory usage is, well, definitely less intrusive.

I plan to use this for a while and see what happens. I will likely find something that drives me back to Firefox eventually.

Ok, found a weirdness when you use a <li> tag in the Wordpress editor. It seems that it starts injecting <div> tags to differentiate paragraphs after you close out the list.

Tags: , , , , , , ,

Google Chrome: See No Evil, Do No Evil - An Internet Performance Perspective

September 1st, 2008 by smp | Comments | Filed in Commentary, Technology, Web Performance, WebPerformance.Org

The intertubes of the Web are abuzz with talk of the new, open-source Google Chrome browser [two articles here and here]. I will not presume to wade into the debate of whether it is necessary, or what strategic business goals Google has set that rely on having its own browser. I will limit my comments to the area of Web performance.

Open-Source Browser: Ours or Theirs?

When I read that Google Chrome was an open-source browser, the first thought was: is it theirs or a re-branded Firefox? No one knows at this point, but that will have a direct effect on how the browser performs, and how extensible it will be.

HTTP Standards

Unlike other standards, HTTP standards set out how a browser uses the underlying TCP stack. MSIE6/7 have very broken implementations, and MSIE8 is building on those by increasing the number of connections per host to 6, up from 2 set out in RFC 2616.

Firefox can be configured to mangle this as well, but by default it plays by the standard, adding the option of HTTP pipelining into its mix of persistent HTTP connections.

It will be VERY interesting to see how Google Chrome comes configured out of the box, and how much control users have over the HTTP behaviour of this new browser.

(X)HTML/CSS/JS Standards

This area is a mess. No browser implements this standards in a way that is completely consistent with the written text, and page designers have to use a variety of page testing products (such as BrowserCam) prior to release to ensure that their design is somewhat presentable in all browsers on all platforms.

The rendering of Javascript will be crucial in this new browser, as so much of the new Web is built on applications that are almost completely Javascript-driven.

I am sure that there will be sites that will be completely mangled by the new browser, but, knowing Google, we will be getting a 2.0 release, the 1.0 release being used within Google for a while now to test it under real-world conditions.

Caching

As a few sites in the world do use cache-control headers properly, it will be interesting to see how a browser created by one of the major ad-serving and search providers on the Web tracks page objects. Will it follow explicit/implicit caching rules? Or will it impose a heavy penalty on bandwidth by downloading objects more frequently than other production browsers do?

Proxies, and the Debacle of the Google Web Accelerator

Back in 2005, Google launched a badly designed and gighly flawed product called the Google Web Accelerator. This product proxied Web traffic through the Google network and allowed the company to develop a pattern of user browsing habits and search selections that would allow them to better target their ad products.

I have a great fear that this will be an integrated part of the Google browser project. If it is, it should be a configurable option, not an out-of-the box standard.

I am sure that there will be a few performance conversations that occur around the Google Chrome browser in the weeks ahead. I look forward to hearing what the community has to say about this new addition to the browser wars.

Tags: , , , , , , , , , , , , ,

Internet Explorer: Plan to completely support RFC 2616 anytime before the next ice age?

March 13th, 2007 by smp | Comments | Filed in Technology, Web Performance

I am writing up a client presentation for next week, and I just realized just how flawed Internet Explorer is. Microsoft claims that the browser is standards compliant. Yet it still doesn’t support HTTP pipelining.

And the frustrating part? They won’t tell us why. I have my suspicions, which include TCP stack issues and a flawed HTTP handling mechanism that is still based on Windows 95 architecture, but an explanation from Redmond would be nice.

Every (and I mean every) other browser can do this.

Microsoft, it’s time you detached your Web browser from your OS, like you’ve forced everyone else to do.

Tags: , , , , , ,

Tags: , , , , , , , , , , , , , , , , , , , , , , , ,

Using Client-Side Cache Solutions And Server-Side Caching Configurations To Improve Internet Performance

October 3rd, 2006 by smp | Comments | Filed in Web Performance, WebPerformance.Org

In todays highly competitive e-commerce marketplace, the performance of a web-site plays a key role in attracting new and retaining current clients. New technologies are being developed to help speed up the delivery of content to customers while still allowing companies to get their message across using rich, graphical content. However, in the rush to find new technologies to improve internet performance, one low-cost alternative to these new technologies is often overlooked: client-side content caching.

This process is often overlooked or dismissed by web administrators and content providers seeking to improve performance. The major concern that is expressed by these groups is that they need to ensure that clients always get the freshest content possible. In their eyes, allowing their content to be cached is perceived as losing control of their message.

This bias against caching is, in most cases, unjustified. By understanding how server software can be used to distinguish unique caching policies for each type of content being delivered, client-side performance gains can be achieved with no new hardware or software being added to an existing web-site system.

Caching

When a client requests web content, this information is either retrieved directly from the origin server, from a browser cache on a local hard drive or from a nearby cache server[1]. Where and for how long the data is stored depends on how the data is tagged when it leaves the web server. However, when discussing cache content, there are three states that content can be in: non-cacheable, fresh or stale.

The non-cacheable state indicates a file that should never be cached by any device that receives it and that every request for that file must be retrieved from the origin server. This places an additional load on both client and server bandwidth, as well as on the server which responds to these additional requests. In many cases, such as database queries, news content, and personalized content marked by unique cookies, the content provider may explicitly not want data to be cached to prevent stale data from being received by the client.

A fresh file is one that has a clearly defined future expiration date and/or does not indicate that it is non-cacheable. A file with a defined lifespan is only valid for a set number of seconds after it is downloaded, or until the explicitly stated expiry date and time is reached. At that point, the file is considered stale and must be re-verified (preferred as it requires less bandwidth) or re-loaded from the origin server.[2]

If a file does not explicitly indicate it is non-cacheable, but does not indicate an explicit expiry period or time, the cache server assigns the file an expiry time defined in the cache servers configuration. When that deadline is reached and the cache server receives a request for that file, the server checks with the origin server to see whether the content has changed. If the file is unchanged, the counter is reset and the existing content is served to the client; if the file is changed, the new content is downloaded, cached according to its settings and then served to the client.

A stale file is a file in cache that is no longer valid. A client has requested information that had previously been stored in the cache and the control data for the object indicates that it has expired or is too stale to be considered for serving. The browser or cache server must now either re-validate the file with or retrieve the file from the origin server before the data can served to the client.

The state of an item being considered for caching is determined using one or more of 5 HTTP header messages[3] two server messages, one client message, and two that can be sent by either the client or the server.[4] These headers include: Pragma: no-cache; Cache-Control; Expires; Last-Modified; and If-Modified-Since. Each of these identifies a particular condition that the proxy server must adhere to when deciding whether the content is fresh enough to be served to the requesting client.

Pragma: no-cache is an HTTP/1.0 client and server header that informs caching servers not to serve the requested content to the client from their cache (client-side) and not cache the marked information if they receive it (server-side). This response has been deprecated in favor of the new HTTP/1.1 Cache-Control header, but is still used in many browsers and servers. The continued use of this header is necessary to ensure backwards-compatibility, as it cannot be guaranteed that all devices and servers will understand the HTTP/1.1 server headers.

Cache-Control is a family of HTTP/1.1 client and server messages that can be used to clearly define not only if an item can be cached, but also for how long and how it should be validated upon expiry. This more precise family of messages replaces the older Pragma: no-cache message. There are a large number of options for this header field, but four that are especially relevant to this discussion.[5]

Cache-Control: private/public

This setting indicates what type of devices can cache the data. The private setting allows the marked items to be cached by the requesting client, but not by any cache servers encountered en-route. The public setting indicates that any device can cache this content. By default, public is assumed unless private is explicitly stated.

Cache-Control: no-cache

This is the HTTP/1.1 equivalent of Pragma: no-cache and can be used by clients to force an end-to-end retrieval of the requested files and by servers to prevent items from being cached.

Cache-Control: max-age=x

This setting allows the indicated files to be cached either by the client or the cache server for x seconds.

Cache-Control: must-revalidate

This setting informs the cache server that if the item in cache is stale, it must be re-validated before it can be served to the client.

A number of these settings can be combined to form a larger Cache-Control header message. For example, an administrator may want to define how long the content is valid for, and then indicate that, at the end of that period, all new requests must be revalidated with the origin server. This can be accomplished by creating a multi-field Cache-Control header message like the one below.

Cache-Control: max-age=3600, must-revalidate

Expires sets an explicit expiry date and time for the requested file. This is usually in the future, but a server administrator can ensure that an object is always re-validated by setting an expiry date that is in the past an example of this will be shown below.

Last-Modified can indicate one of several conditions, but the most common is the last time the state of the requested object was updated. The cache server can use this to confirm an object has not changed since it was inserted into the cache, allowing for re-validation, versus completely re-loading, of objects in cache.

If-Modified-Since is a client-side header message that is sent either by a browser or a cache server and is set by the Last-Modified value of the object in cache. When the origin server has not set an explicit cache expiry value and the cache server has had to set an expiry time on the object using its own internal configuration, the Last-Modified value is used to confirm whether content has changed on the origin server. If the Last-Modified value on an object held by the origin server is newer than that held by the client, the entire file is re-loaded. If these values are the same, the origin server returns a 304 Not Modified HTTP message and the cache object is then served to the client and has its cache-defined counter reset.

Using an application trace program, clients are able to capture the data that flows out of and in to the browser application. The following two examples show how a server can use header messages to mark content as non-cacheable, or set very specific caching values.

Server Messages for a Non-Cacheable Object

HTTP/1.0 200 OK
Content-Type: text/html
Content-Length: 19662
Pragma: no-cache
Cache-Control: no-cache
Server: Roxen/2.1.185
Accept-Ranges: bytes
Expires: Wed, 03 Jan 2001 00:18:55 GMT

In this example, the server returns three indications that the content is non-cacheable. The first two are the Pragma: no-cache and Cache-Control: no-cache statements. With most client and cache server configurations, one of these headers on its own should be enough to prevent the requested object from being stored in cache. The web administrator in this example has chosen to ensure that any device, regardless of the version of HTTP used, will clearly understand that this object is non-cacheable.

However, in order to guarantee that this item is never stored in or served from cache, the Expires statement is set to a date and time that is in the past.[6] These three statements should be enough to guarantee that no cache serves this file without performing an end-to-end transfer of this object from the origin server with each request.

Specific Caching Information in Server Messages

HTTP/1.1 200 OK
Date: Tue, 13 Feb 2001 14:50:31 GMT
Server: Apache/1.3.12
Cache-Control: max-age=43200
Expires: Wed, 14 Feb 2001 02:50:31 GMT
Last-Modified: Sun, 03 Dec 2000 23:52:56 GMT
ETag: "1cbf3-dfd-3a2adcd8"
Accept-Ranges: bytes
Content-Length: 3581
Connection: close
Content-Type: text/html

In the example above, the server returns a header message Cache-Control: max-age=43200. This immediately informs the cache that the object can be stored in cache for up to 12 hours. This 12-hour time limit is further guaranteed by the Expires header, which is set to a date value that is exactly 12 hours ahead of the value set in the Date header message.[7]

These two examples present two variations of web server responses containing information that makes the requested content either completely non-cacheable or cacheable only for a very specific period of time.

How does caching work?

Content is cached by devices on the internet, and these devices then serve this stored content when the same file is requested by the original client or another client that uses that same cache. This rather simplistic description covers a number of different cache scenarios, but two will be the focus of this paper browser caching and caching servers.[8]

For the remainder of this paper, the caching environment that will be discussed is one involving a network with a number of clients using a single cache server, the general internet, and a server network with a series of web servers on it.

Browser Caching

Browser caching is what most people are familiar with, as all web browsers perform this behavior by default. With this type of caching, the web browser stores a copy of the requested files in a cache directory on the client machine in order to help speed up page downloads. This performance increase is achieved by serving stored files from this directory on the local hard drive instead of retrieving these same files from the web server, which resides across a much slower connection than the one between the hard-drive and the local application, when an item that is stored in cache is requested.

To ensure that old content is not being served to the client, the browser checks its cache first to see if an item is in cache. If the item is in cache, the browser then confirms the state of the object in cache with the origin server to see if the item has been modified at the source since the browser last downloaded it. If the object has not been modified, the origin server sends a 304 Not Modified message, and the item is served from the local hard drive and not across the much slower internet.

First Request for a file

GET /file.html HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-powerpoint, application/vnd.ms-excel, application/msword, application/x-comet, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
Host: 24.5.203.101
Connection: Keep-Alive

HTTP/1.1 200 OK
Date: Tue, 13 Feb 2001 20:00:22 GMT
Server: Apache
Cache-Control: max-age=604800
Last-Modified: Wed, 29 Nov 2000 15:28:38 GMT
ETag: "1df-28f1-3a2520a6"
Accept-Ranges: bytes
Content-Length: 10481
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html

In the above example[9], the file is retrieved from the server for the first time, and the server sends a 200 OK response and then returns the requested file. The items marked in blue indicate cache control data sent to the client by the server.

Second Request for a file

GET /file.html HTTP/1.1
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
If-Modified-Since: Wed, 29 Nov 2000 15:28:38 GMT
If-None-Match: "1df-28f1-3a2520a6"
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
Host: 24.5.203.101
Connection: Keep-Alive

HTTP/1.1 304 Not Modified
Date: Tue, 13 Feb 2001 20:01:07 GMT
Server: Apache
Connection: Keep-Alive
Keep-Alive: timeout=5, max=100
ETag: "1df-28f1-3a2520a6"
Cache-Control: max-age=604800

The second request for a file sees the client send a request for the same object 40 seconds later, but with two additions. The server asks if the file has been modified since the last time it was requested by the client (If-Modified-Since). If the date in that field cannot be used by the origin server to confirm the state of the requested object, the client asks if the objects Etag tracking code has changed using the If-None-Match header message.[10] The origin server responds by verifying that object has not been modified and confirms this by returning the same Etag value that was sent by the client. This rapid client-server exchange allows the browser to quickly determine that it can serve the file directly from its local cache directory.

Caching Server

A caching server performs functions similar to those of a browser cache, only on a much larger scale. Where a browser cache is responsible for storing web objects for a single browser application on a single machine, a cache server stores web objects for a larger number of clients or perhaps even an entire network. With a cache server, all web requests from a network are passed through caching server, which then will serve the requested files to the client. The cache server can deliver content either directly from its own cache of objects, or by retrieving objects from the internet and then serving them to clients. [11]

Cache servers are a more efficient than browser caches as this network-level caching process makes the object available to all users of the network once it has been retrieved. With a browser cache, each user and, in fact, each browser application on a specific client must maintain a unique cache of files that is not shared with other clients or applications.

Also, cache servers use additional information provided by the web server in the headers sent along with each web request. Browser caches simply re-validate content with each request, confirming that the content has not been modified since it was last requested. Cache servers use the values sent in the Expires and Cache-Control header messages to set explicit expiry times for objects they store.

First Request for a file through a cache server

GET http://24.5.203.101/file.html HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-powerpoint, application/vnd.ms-excel, application/msword, application/x-comet, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
Host: 24.5.203.101
Proxy-Connection: Keep-Alive

HTTP/1.0 200 OK
Date: Tue, 16 Jan 2001 15:46:42 GMT
Server: Apache
Cache-Control: max-age=604800
Last-Modified: Wed, 29 Nov 2000 15:28:38 GMT
ETag: "1df-28f1-3a2520a6"
Content-Length: 10481
Content-Type: text/html
Connection: Close

The first request from the client through a cache server shows two very interesting things.[12] The first is that although the client request was sent out as HTTP/1.1, the server responded using HTTP/1.0. The browser caching example above demonstrated that the responding server uses HTTP/1.1. The change in protocol is the first clue that this data was served by a cache server.

The second item of interest is that the file that is initially served by the proxy server has a Date field set to January 16, 2001. This server is not serving stale data; this is the default time set by the cache server to indicate a new object that has been inserted in the cache.[13]

Second Request for a file through a cache server Second Browser

GET http://24.5.203.101/file.html HTTP/1.1
Host: 24.5.203.101
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; 0.7) Gecko/20010109
Accept: */*
Accept-Language: en
Accept-Encoding: gzip,deflate,compress,identity
Keep-Alive: 300
Connection: keep-alive

HTTP/1.0 200 OK
Date: Tue, 16 Jan 2001 15:46:42 GMT
Server: Apache
Cache-Control: max-age=604800
Last-Modified: Wed, 29 Nov 2000 15:28:38 GMT
ETag: "1df-28f1-3a2520a6"
Content-Length: 10481
Content-Type: text/html
Connection: Close

Third Request for a file through a cache server Second Client Machine

GET http://24.5.203.101/file.html HTTP/1.0
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-powerpoint,
application/vnd.ms-excel, application/msword, */*
Accept-Language: en-us
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
Host: 24.5.203.101
Proxy-Connection: Keep-Alive

HTTP/1.0 200 OK
Date: Tue, 16 Jan 2001 15:46:42 GMT
Server: Apache
Cache-Control: max-age=604800
Last-Modified: Wed, 29 Nov 2000 15:28:38 GMT
ETag: "1df-28f1-3a2520a6"
Content-Length: 10481
Content-Type: text/html
Connection: Close

A second request through the cache server, using another browser on the same client configured to use the cache server, indicates that this client retrieved the file from the cache server, not from the origin server. The Date field is the same as the initial request and the protocol has once again been swapped from HTTP/1.1 to HTTP/1.0.

The third example shows that the object is now not only available to different browsers on the same machine, but now that it is available to different machines on the same network, using the same cache server. By requesting the same content from another client machine on the same network, it is clear that the object is served to the client by the cache server, as the Date field set to the same value observed in the previous two examples.

Why should data be cached?

Many web pages that are downloaded by web browsers today are marked as being non-cacheable. The theory behind this is that there is so much dynamic and personalized content on the internet today that if any of it is cached, people using the web may not have the freshest possible content or they may end up receiving content that was personalized for another client making use of the same cache server.

The dynamic and personalized nature of the web today does make this a challenge, but if the design of a web-site is examined closely, it can be seen that these new features of the web can work hand-in-hand with content caching.

How does caching the perceived user experience? In both the browser caching and caching server discussions above, it has been demonstrated that caching helps attack the problem of internet performance on three fronts. First, caching moves content closer to the client, by placing it on local hard-drives or in local network caches. With data stored on or near the client, the network delay encountered when trying to retrieve the data is reduced or eliminated.

Secondly, caching reduces network traffic by serving content that is fresh as described above. Cache servers will attempt to confirm with the origin server that the objects stored in cache if not explicitly marked for expiry are still valid and do not need to be fully re-loaded across the internet. In order to gain the maximum performance benefit from object caching, it is vital to specify explicit cache expiry dates or periods.

The final performance benefit to properly defining caching configurations of content on an origin server is that server load is reduced. If the server uses carefully planned explicit caching policies, server load can be greatly reduced, improving the user experience.

When examining how the configuration of a web server can be modified to improve content cacheability, it is important keep in mind two very important considerations. First, the content and site administrators must have a very granular level of control over how the content being served will or wont be cached once it leaves their server. Secondly, within this need to control how content is cached, ways should be found to minimize the impact that client requests have on bandwidth and server load by allowing some content to be cached.

Take the example of a large, popular site that is noted for its dynamic content and rich graphics. Despite having a great deal of dynamic content, caching can serve a beneficial purpose without compromising the nature of the content being served. The primary focus of the caching evaluation should be on the rich graphical content of the site.

If the images of this site all have unique names that are not shared by any other object on the site, or the images all reside in the same directory tree, then this content can be marked differently within the server configuration, allowing it to be cached.[14] A policy that allows these objects to be cached for 60, 120 or 180 seconds could have a large affect on reducing the bandwidth and server strain at the modified site. During this seemingly short period of time, several dozen of even several hundred different requests for the same object could originate from a large corporate network or ISP. If local cache servers can handle these requests, both the server and client sides of the transaction could see immediate performance improvements.

Taking a server header from an example used earlier in the paper, it can be demonstrated how even a slight change to the server header itself can help control the caching properties of dynamic content.

Dynamic Content

HTTP/1.1 200 OK
Date: Tue, 13 Feb 2001 14:50:31 GMT
Server: Apache/1.3.12
Cache-Control: no-cache, must-revalidate
Expires: Sat, 13 Jan 2001 14:50:31 GMT
Last-Modified: Sun, 03 Dec 2000 23:52:56 GMT
ETag: "1cbf3-dfd-3a2adcd8"
Accept-Ranges: bytes
Content-Length: 3581
Connection: close
Content-Type: text/html

Static Content

HTTP/1.1 200 OK
Date: Tue, 13 Feb 2001 14:50:31 GMT
Server: Apache/1.3.12
Cache-Control: max-age=43200, must-revalidate
Expires: Wed, 14 Feb 2001 02:50:31 GMT
Last-Modified: Sun, 03 Dec 2000 23:52:56 GMT
ETag: "1cbf3-dfd-3a2adcd8"
Accept-Ranges: bytes
Content-Length: 3581
Connection: close
Content-Type: text/html

As can been seen above, the only difference in the headers sent with the Dynamic Content and the Static Content are the Cache-Control and Expires values. The Dynamic Content example sets Cache-Control to no-cache, must-revalidate and Expires to one month in the past. This should prevent any cache from storing this data or serving it when a request is received to retrieve the same content.

The Static Content modifies these two settings, making the requested object cacheable for up to 12 hours Cache-Control value set to 43,200 seconds and an Expires value that is exactly 12 hours in the future. After the period specified, the browser cache or caching server must re-validate the content before it can be served in response to local requests.

The must-revalidate item is not necessary, but it does add additional control over content. Some cache servers will attempt to serve content that is stale under certain circumstances, such as if the origin server for the content cannot be reached. The must-revalidate setting forces the cache server to re-validate the stale content, and return an error if it cannot be retrieved.

Differentiating caching policies based on the type of content served allows a very granular level of control over what is not cached, what is cached, and for how long the content can be cached for and still be considered fresh. In this way, server and web administrators can improve site performance a little or no additional development or capital cost.

It is very important to note that defining specific server-side caching policies will only have a beneficial affect on server performance if explicit object caching configurations are used. The two main types of explicit caching configurations are those set by the Expires header and the Cache-Control family of headers as seen in the example above. If no explicit value is set for object expiry, performance gains that might have been achieved are eliminated by a flood of unnecessary client and cache server requests to re-validate unchanged objects with the origin server.

Conclusion

Despite the growth of dynamic and personalized content on the web, there is still a great deal of highly cacheable material that is served to clients. However, many sites do not take advantage of the performance gains that can be achieved by isolating the dynamic and personalized content of their site from the relatively static content that is served alongside it.

Using the inherent ability to set explicit caching policies within most modern web-server applications, objects handled by a web-sever can be separated into unique content groups. With distinct caching policies for each defined group of web objects, the web-site administrator, not the cache administrator, has control over how long content is served without re-validation or re-loading. This granular control of explicit content caching policies can allow web-sites to achieve noticeable performance gains with no additional outlay for hardware and software.


Endnotes

[1] The proximity that is referred to here is network proximity, not physical proximity. For example, AOLs network has some of the worlds largest cache servers and they are concentrated in Virginia; however, because of the structure of AOLs network, these cache servers are not far from the client.

[2] A re-verify is preferred as it consumes less bandwidth than a full re-load of the object from the origin server. With a re-verification, the origin server just confirms that the file is still valid and the cache server can simply reset the timer on the object.

[3] An HTTP header message is a data-control message sent by a web client or a web server to indicate a variety of data transmission parameters concerning the requests being made. Caching information is included in the information sent to and from the server.

[4] There are actually a substantially larger number of header messages that can be applied to a client or a server data transmission to communicate caching information. The most up-to-date list of the messages can be found in section 13 of RFC 2616, Hypertext Transfer Protocol — HTTP/1.1.

[5] A complete listing of the Cache-Control settings can be found in RFC 2616, Hypertext Transfer Protocol — HTTP/1.1, section 14.9.

[6] The initial file request that generated this header was sent on February 12, 2001.

[7] The Date header message indicates the date and time on the origin server when it responded to the request.

[8] A third type of caching, Reverse Caching or HTTPD Accelerators, are used at the server side to place highly cacheable content into high-speed machines that use solid-state storage to make retrieval of these objects very fast. This reduces the load on the web servers and allows them to concentrate on the generation of dynamic and personalized content.

[9] The data shown here is just application trace data.

[10] The Etag or entity tag is used to identify specific objects on a web server. Each item has unique Etag value, and this value is changed each time the file is modified. As an example, the Etag for a local web file was captured. This data was re-captured after the file was modified two carriage returns were inserted.

Test 1 Original File

ETag: "21ccd-10cb-399a1b33"

Test 2 Modified File

ETag: "21ccd-10cd-3a8c0597"

[11] This is where the other name for a cache server comes from, as the cache server acts as a proxy for the client making the request. The term proxy server is outdated as the term proxy assumes that the device will do exactly as the client requests; this is not always the case due to the security and content control mechanisms which are a part of all cache servers today. The client isnt always guaranteed to receive the complete content they requested. Infact, many networks do not allow any content into the network that does not first go through the cache devices on that network.

[12] The data shown here is just application trace data. For a more complete example of what the application and network properties of a web object retrieval are, please see Appendix A and B.

[13] All the data captures used in this example were taken on February 11-14, 2001.

[14] The description used here is based on the configuration options available with the Apache/1.3.x server family, which allows caching options to be set down to the file level. Other server applications may vary in their methods of applying individual caching policies to different sets of content on the same server.

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Microsoft: The NSA Made us do it!

August 17th, 2006 by smp | Comments | Filed in Technology, Web Performance

Apparent using HTTP compression alongside HTTP/1.1 will cause certain versions of MSIE 6.0 to implode. [here]

I personally think this was because the NSA power shortage was making it too hard for the spooks to snoop on compressed Web traffic. [here]

Via: Port80 Software

PS: No, I won’t turn off compression because Microsoft did something really stupid.

Technorati Tags: , , , , , , ,

Tags: , , , , , , , , , , , , , , , , , , , , ,

Web Performance: Some posts of interest

August 1st, 2006 by smp | Comments | Filed in Web Performance

This morning’s bounty of posts brought in two that will make you think.

First was Port80 Software’s comments on using the Cache-Control mechanism embedded in all browsers. This is interesting to read, as I have been trying to get companies to use this mechanism more intelligently for a number of years. I know that the Port80 team gets it, but it is always nice to have some outside validation of a position you have tried to evangelize for a long time.

The second was Tim O’Reilly’s post on Cal Henderson’s new book on Web scalability. While I am likely to purchase the book for a professional interest, I have one problem with Flickr’s current configuration: static.flickr.com does not use HTTP persistence, something I noted last week. This strikes me as weird.

It’s always good to see Web performance rear its head.

Technorati Tags: , , , , , ,

Tags: , , , , , , , , , , , , , , , , , , , , , , , ,

Web Performance — Flickr: Do you want to get faster?

July 21st, 2006 by smp | Comments | Filed in Web Performance

Dear Flickr:

I have been wondering for sometime why downloads from your site seemed a little sluggish at times.

At first I blamed your unprecedented growth and success. For a little Vancouver startup (I am a BC boy myself), your entrance onto the stage of social networking applications has been phenomenal. The move from zero to infinity may have played a part in the performance I was seeing.

Nope. There was something else going on; I could see it every time I loaded a Flickr page in my browser. There was something else going on.

So today, I checked something out, and found the problem.

You need to enable persistent TCP connection on the static.flickr.com servers.

Now, that is the simple answer. I know that with large, web-based applications, enabling something as monumental as persistent connections could cause serious issues. If the architecture of the system was not designed to handle persistent connections, turning them on could cause the entire system to collapse.

There are legitimate, if mis-guided, reasons for disabled persistent connections. Some administrators believe that it is actually more efficient to have a client open a connection for every object. Easier to manage state, etc. The only problem is that in order to do that, you have to tune the systems serving data to shorten the amount of time a closed connection spends in a TIME_WAIT state.

When a TCP connection is closed, the socket is not immediately closed by the system in a default configuration. The TIME_WAIT state is the holding pen that these connections are pushed into. While in this state, the socket is locked and this may count against the incoming TCP connection queue, forcing the network stack to delay or reset new incoming connections.

Still, as Flickr is a worldwide company, the delay that the lack of persistent connections injects is astounding for locations in Asia. If you want to grow your business, and support more services, this will likely become a bottleneck very quickly.

Have a great weekend!

smp

Technorati Tags: , , , , ,

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , ,