Posts Tagged ‘Web content’

Web Performance: A Review of Steve Souders’ High Performance Web Sites

September 15th, 2008 by smp | Comments | Filed in The Web, Web Performance, WebPerformance.Org

It’s not often as a Web performance consulatant and analyst that I find a book that is useful to so many clients. It’s much more rare to discover a book that can help most Web sites improve their response times and consistency in fewer than 140 pages.

Steve Souders’ High Performance Web Sites (O’Reilly, 2007 - Companion Site) captures the essence of one-side of the Web performance problem succinctly and efficiently, delivering a strong message to a group he classifies as front-end engineers. It is written in a way that can be understood by marketing, line-of-business, and technical teams. It is written in a manner designed to provoke discussions within an organization with the ultimate goal of improving Web performance

Once these discussion have started, there may some shock withing these very organizations. Not only with the ease with which these rules can be implemented, but by the realization that the fourteen rules in this book will only take you so far.

The 14 Rules

Web performance, in Souders’ world, can be greatly improved by applying his fourteen Web performance rules. For the record, the rules are:

Rule 1 - Make Fewer HTTP Requests
Rule 2 - Use a Content Delivery Network
Rule 3 - Add an Expires Header
Rule 4 - Gzip Components
Rule 5 - Put Stylesheets at the Top
Rule 6 - Put Scripts at the Bottom
Rule 7 - Avoid CSS Expressions
Rule 8 - Make JavaScript and CSS External
Rule 9 - Reduce DNS Lookups
Rule 10 - Minify JavaScript
Rule 11 - Avoid Redirects
Rule 12 - Remove Duplicate Scripts
Rule 13 - Configure ETags
Rule 14 - Make AJAX Cacheable

From the Companion Site [here]

These rules seem simple enough. And, in fact, most of them are easy to understand, and, in an increasingly complex technical world, easy to implement. In fact, the most fascinating thing about the lessons in this book, for the people who think about these things everyday, is that they are pieces of basic knowledge, tribal wisdom, that have been passed down for as long as the Web has existed.

Conceptually, the rules can be broken down to:

  • Ask for fewer things
  • Move stuff closer
  • Make things smaller
  • Make things less confusing

These four things are simple enough to understand, as they emphasize simplicity over complexity.

For Web site designers, these fourteen rules are critical to understanding how to drive better performance not only in existing Web sites, but in all of the sites developed in the future. They provide a vocabulary to those who are lost when discussions of Web performance occur. The fourteen rules show that Web performance can be improved, and that something can be done to make things better.

Beyond the 14 Steps

There is, however, a deeper, darker world beneath the fourteen rules. A world where complexity and interrelated components make change difficult to accomplish.

In a simple world, the fourteen rules will make a Web site faster. There is no doubt about that. They advocate for the reduction object size (for text objects), the location of content closer to the people requesting it (CDNs), and the optimization of code to accelerate the parsing and display of Web content in the browser.

Deep inside a Web site lives the presentation and application code, the guts that keep a site running. These layers, down below the waterline are responsible for the heavy lifting, the personalization of a bank account display, the retrieval of semantic search results, and the processing of complex, user-defined transactions. The data that is bounced inside a Web application flows through a myriad of network devices — firewalls, routers, switches, application proxies, etc — that can be as complex, if not more so, than the network complexity involved in delivering the content to the client.

It is fair to say that a modern Web site is the proverbial duck in a strong current.

The fourteen rules are lost down here beneath the Web layer. In these murky depths, far from the flash and glamor, parsing functions that are written poorly, database table without indices, internal networks that are poorly designed can all wreak havoc on a site that has taken all fourteen rules to heart.

When the content that is not directly controlled and managed by the Web site is added into this boiling stew, another layer of possible complexity and performance challenge appears. Third parties, CDNs, advertisers, helper applications all come from external sources that are relied on to have taken not only the fourteen rules to heart, but also to have considered how their data is created, presented, and delivered to the visitors to the Web site that appears to contain it.

Remember the Complexity

High Performance Web Sites is a volume (a pamphlet really) that delivers a simple message: there is something that can be done to improve the performance of a Web site. Souders’ fourteen rules capture the items that can be changed quickly, and at low-cost.

However, if you ask Steve Souders’ if this is all you need to do to have a fast, efficient, and reliable Web site, he should say no. The fourteen rules are an excellent start, as they handle a great deal of the visible disease that infects so many Web sites.

However, like the triathlete with an undiagnosed brain tumor, there is a lot more under the surface that needs to be addressed in order to deliver Web performance improvements that can be seen by all, and support rapid, scalable growth.

This is a book that must be read. Then deeper questions must be asked to ensure that the performance of the 90% of a Web site design not seen by visitors matches the 10% that is.

Tags: , , , , , , , , , , , , , , , , , , ,

Baseline Testing With cURL

October 3rd, 2006 by smp | Comments | Filed in Web Performance, WebPerformance.Org

cURL is an application that can be used to retrieve any Internet file that uses the standard URL format — http://, ftp://, gopher://, etc. Its power and flexibility can be added to applications by using the libcurl library, whose API can be accessed easily using most of the commonly used scripting and programming languages.

So, how does cURL differ from some of the other command-line URL retrieval tools such as WGET? Both do very similar things, and can be coaxed to retrieve large lists of files or even mirror entire Web sites. In fact, for the automated retrieval of single files for the Internet for storage on local filesystems — such as downloading source files onto servers for building applications — WGET’s syntax is the simplest to use.

However, for simple baseline testing, WGET lacks cURL’s ability to produce timing results that can be written to an output file in a user-configurable format. cURL gathers a large amount of data about a transfer that can then be used for analysis or logging purposes. This makes it a step ahead of WGET for baseline testing.

cURL Installation

For the purposes of our testing, we have used cURL 7.10.5-pre2 as it adds support for downloading and interpreting GZIP-encoded content from Web servers. Because it is a pre-release version, it is currently only available as source for compiling. The compilation was smooth, and straight-forward.

$ ./configure --with-ssl --with-zlib
$ make
$ make test

[...runs about 120 checks to ensure the application and library will work as expected..]

# make install

The application installed in /usr/local/bin on my RedHat 9.0 laptop.

Testing cURL is straight-forward as well.

$ curl http://slashdot.org/

[...many lines of streaming HTML omitted...]

Variations on this standard theme include:

  • Send output to a file instead of STDOUT
  • 	$ curl -o ~/slashdot.txt http://slashdot.org/
  • Request compressed content if the Web server supports it
  • 	$ curl --compressed http://slashdot.org/
  • Provide total byte count for downloaded HTML
  • 	$ curl -w %{size_download} http://slashdot.org/

    Baseline Testing with cURL

    With the application installed, you can now begin to design a baseline test. This methodology is NOT a replacement for true load testing, but rather a method for giving small and medium-sized businesses a sense of how well their server will perform before it is deployed into production, as well as providing a baseline for future tests. This baseline can then be used as a basis for comparing performance after configuration changes in the server environment, such as caching rule changes or adding solutions that are designed to accelerate Web performance.

    To begin, a list of URLs needs to be drawn up and agreed to as a baseline for the testing. For my purposes, I use the files from the Linux Documentation project, intermingled with a number of images. This provides the test with a variety of file sizes and file types. You could construct your own file-set out of any combination of documents/files/images you wish. However, the file-set should be large — mine runs to 2134 files.

    Once the file-set has been determined, it should be archived so that this same group can be used for future performance tests; burning it to a CD is always a safe bet.

    Next, extract the filenames to a text file so that the configuration file for the tests can be constructed. I have done this for my tests, and have it set up in a generic format so that when I construct the configuration for the next test, I simply have to change/update the URL to reflect the new target.

    The configuration of the rest of the parameters should be added to the configuration file at this point. These are all the same as the command line versions, except for the URL listing format.

  • Listing of test_config.txt
  • -A "Mozilla/4.0 (compatible; cURL 7.10.5-pre2; Linux 2.4.20)"
    -L
    -w @logformat.txt
    -D headers.txt
    -H "Pragma: no-cache"
    -H "Cache-control: no-cache"
    -H "Connection: close"
    
    url="http://www.foobar.com/1.html"
    url="http://www.foobar.com/2.png"
    [...file listing...]

    In the above example, I have set cURL to:

    • Use a custom User-Agent string
    • Follow any re-direction responses that contain a “Location:” response header
    • Dump the server response headers to headers.txt
    • Circumvent cached responses by sending the two main “no-cache” request headers
    • Close the TCP connection after each object is downloaded, overriding cURL’s default use of persistent connections
    • Format the timing and log output using the format that is described in logformat.txt

    Another command-line option that I use a lot is –compressed, which, as of cURL 7.10.5, handles both the deflate and gzip encoding of Web content, including decompression on the fly. This is great for comparing the performance improvements and bandwidth savings from compression solutions against a baseline test without compression. Network administrators may also be interested in testing the improvement that they get using proxy servers and client-side caches by inserting –proxy <proxy[:port]> into the configuration, removing the “no-cache” headers, and testing a list of popular URLs through their proxy servers.

    The logformat.txt file describes the variables that I find of interest and that I want to use for my analysis.

  • Listing of logformat.txt
  • n
    %{url_effective}t%{http_code}t%{content_type}t%{time_total}t%{time_lookup}t /
    	%{time_connect}t%{time_starttransfer}t{size_download}n
    n

    These variables are defined as:

  • url_effective: URL used to make the final request, especially when following re-directions
  • http_code: HTTP code returned by the server when delivering the final HTML page requested
  • content_type: MIME type returned in the final HTML request
  • time_total: Total time for the transfer to complete
  • time_lookup: Time from start of transfer until DNS Lookup complete
  • time_connect: Time from start of transfer until TCP connection complete
  • time_starttransfer: Time from start of transfer until data begins to be returned from the server
  • size_download: Total number of bytes transferred, excluding headers
  • As time_connect and time_starttransfer are cumulative from the beginning of the transfer, you have to do some math to come up with the actual values.

    TCP Connection Time = time_connect - time_lookup
    Time First Byte = time_starttransfer - time_connect
    Redirection Time = time_total - time_starttransfer

    If you are familiar with cURL, you may wonder why I have chosen not to write the output to a file using the -o <file> option. It appears that this option only records the output for the first file requested, even in a large list of files. I prefer to use the following command to start the test and then post-process the results using grep.

    $ curl -K test_config.txt >> output_raw_1.txt
    
    [...lines and lines of output...]
    
    $ grep -i -r "^http://www.foobar.com/.*$" output_raw_1.txt >> output_processed_1.txt

    And voila! You now have a tab delimited file you can drop into your favorite spreadsheet program to generate the necessary statistics.

    Tags: , , , , , , , , , , , , , , ,