Posts Tagged ‘DNS’

Web Performance: A Review of Steve Souders’ High Performance Web Sites

September 15th, 2008 by smp | Comments | Filed in The Web, Web Performance, WebPerformance.Org

It’s not often as a Web performance consulatant and analyst that I find a book that is useful to so many clients. It’s much more rare to discover a book that can help most Web sites improve their response times and consistency in fewer than 140 pages.

Steve Souders’ High Performance Web Sites (O’Reilly, 2007 - Companion Site) captures the essence of one-side of the Web performance problem succinctly and efficiently, delivering a strong message to a group he classifies as front-end engineers. It is written in a way that can be understood by marketing, line-of-business, and technical teams. It is written in a manner designed to provoke discussions within an organization with the ultimate goal of improving Web performance

Once these discussion have started, there may some shock withing these very organizations. Not only with the ease with which these rules can be implemented, but by the realization that the fourteen rules in this book will only take you so far.

The 14 Rules

Web performance, in Souders’ world, can be greatly improved by applying his fourteen Web performance rules. For the record, the rules are:

Rule 1 - Make Fewer HTTP Requests
Rule 2 - Use a Content Delivery Network
Rule 3 - Add an Expires Header
Rule 4 - Gzip Components
Rule 5 - Put Stylesheets at the Top
Rule 6 - Put Scripts at the Bottom
Rule 7 - Avoid CSS Expressions
Rule 8 - Make JavaScript and CSS External
Rule 9 - Reduce DNS Lookups
Rule 10 - Minify JavaScript
Rule 11 - Avoid Redirects
Rule 12 - Remove Duplicate Scripts
Rule 13 - Configure ETags
Rule 14 - Make AJAX Cacheable

From the Companion Site [here]

These rules seem simple enough. And, in fact, most of them are easy to understand, and, in an increasingly complex technical world, easy to implement. In fact, the most fascinating thing about the lessons in this book, for the people who think about these things everyday, is that they are pieces of basic knowledge, tribal wisdom, that have been passed down for as long as the Web has existed.

Conceptually, the rules can be broken down to:

  • Ask for fewer things
  • Move stuff closer
  • Make things smaller
  • Make things less confusing

These four things are simple enough to understand, as they emphasize simplicity over complexity.

For Web site designers, these fourteen rules are critical to understanding how to drive better performance not only in existing Web sites, but in all of the sites developed in the future. They provide a vocabulary to those who are lost when discussions of Web performance occur. The fourteen rules show that Web performance can be improved, and that something can be done to make things better.

Beyond the 14 Steps

There is, however, a deeper, darker world beneath the fourteen rules. A world where complexity and interrelated components make change difficult to accomplish.

In a simple world, the fourteen rules will make a Web site faster. There is no doubt about that. They advocate for the reduction object size (for text objects), the location of content closer to the people requesting it (CDNs), and the optimization of code to accelerate the parsing and display of Web content in the browser.

Deep inside a Web site lives the presentation and application code, the guts that keep a site running. These layers, down below the waterline are responsible for the heavy lifting, the personalization of a bank account display, the retrieval of semantic search results, and the processing of complex, user-defined transactions. The data that is bounced inside a Web application flows through a myriad of network devices — firewalls, routers, switches, application proxies, etc — that can be as complex, if not more so, than the network complexity involved in delivering the content to the client.

It is fair to say that a modern Web site is the proverbial duck in a strong current.

The fourteen rules are lost down here beneath the Web layer. In these murky depths, far from the flash and glamor, parsing functions that are written poorly, database table without indices, internal networks that are poorly designed can all wreak havoc on a site that has taken all fourteen rules to heart.

When the content that is not directly controlled and managed by the Web site is added into this boiling stew, another layer of possible complexity and performance challenge appears. Third parties, CDNs, advertisers, helper applications all come from external sources that are relied on to have taken not only the fourteen rules to heart, but also to have considered how their data is created, presented, and delivered to the visitors to the Web site that appears to contain it.

Remember the Complexity

High Performance Web Sites is a volume (a pamphlet really) that delivers a simple message: there is something that can be done to improve the performance of a Web site. Souders’ fourteen rules capture the items that can be changed quickly, and at low-cost.

However, if you ask Steve Souders’ if this is all you need to do to have a fast, efficient, and reliable Web site, he should say no. The fourteen rules are an excellent start, as they handle a great deal of the visible disease that infects so many Web sites.

However, like the triathlete with an undiagnosed brain tumor, there is a lot more under the surface that needs to be addressed in order to deliver Web performance improvements that can be seen by all, and support rapid, scalable growth.

This is a book that must be read. Then deeper questions must be asked to ensure that the performance of the 90% of a Web site design not seen by visitors matches the 10% that is.

Tags: , , , , , , , , , , , , , , , , , , ,

DNS: Without it, your site does not exist

September 5th, 2008 by smp | Comments | Filed in The Web, Web Performance, WebPerformance.Org, Work

In my presentations and consultations on Web performance, I emphasize the importance of a correctly configured DNS system with the phrase: “If people can’t resolve your hostname, your site is dead in the water”.

Yesterday, it appears that the large anti-virus and security firm Sophos discovered this lesson the hard way.

Of course hindsight is perfect, so I won’t dwell for too long on this single incident. The lesson to be learned here is that DNS is complex and critical, yet is sometimes overlooked when considered the core issues of Web performance and end-user experience.

This complexity means that if an organization is not comfortable managing their own DNS, or want to broaden and deepen their DNS infrastructure, there are a large number of firms who will assist with this process. These firms whose entire business is based on managing large-scale DNS implementations for organizations.

DNS is critical. Never take it for granted.

Tags: , , , , , ,

Chrome and Advertising - Google’s Plan

September 3rd, 2008 by smp | Comments | Filed in Blogging, The Web, Web Performance, Work

Since I downloaded and started using Chrome yesterday, I have had to rediscover the world of online advertising. Using Firefox and Adblock Plus for nearly three years has shielded from their existence for the most part.

Stephen Noble, in a post on the Forrester Blog for Interactive Marketing Professionals, seems to discover that Chrome will be a source for injecting greater personalization and targeting into the online advertising market.

This is the key reason Chrome exists, right now.

While their may be discussions about the online platform and hosted applications, there are only a small percentage of Internet users who rely on hosted desktop-like applications, excluding email, in their daily work and life.

However, Google’s biggest money-making ventures are advertising and search. With control of AdSense and DoubleClick, there is no doubt that Google controls a vast majority of the targeted and contextual advertising market, around the world.

One of the greatest threats to this money-making is a lack of control of the platform through which ads are delivered. There is talk of IE8 blocking ads (well, non-Microsoft ads anyway), and one of the more popular extensions for Firefox is Adblock Plus. While Safari doesn’t have this ability natively built in, it can be supported by any number of applications that, in the name of Internet security, filter and block online advertisers using end-user proxies.

This threat to Google’s core revenue source was not ignored in the development of Chrome. One of the options is the use of DNS pre-fetching. Now I haven’t thrown up a packet sniffer, but what’s to prevent a part of the pre-fetching algorithm to go beyond DNS for certain content, and pre-fetch the whole object, so that the ads load really fast, and in that way are seen as less intrusive.

Ok, so I am noted for having a paraoid streak.

However, using the fastest rendering engine and a rocket-ship fast Javascript VM is not only good for the new generation of online Web applications, but plays right into the hands of improved ad-delivery.

So, while Chrome is being hailed as the first Web application environment, it is very much a context Web advertising environment as well.

It’s how it was built.

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Baseline Testing With cURL

October 3rd, 2006 by smp | Comments | Filed in Web Performance, WebPerformance.Org

cURL is an application that can be used to retrieve any Internet file that uses the standard URL format — http://, ftp://, gopher://, etc. Its power and flexibility can be added to applications by using the libcurl library, whose API can be accessed easily using most of the commonly used scripting and programming languages.

So, how does cURL differ from some of the other command-line URL retrieval tools such as WGET? Both do very similar things, and can be coaxed to retrieve large lists of files or even mirror entire Web sites. In fact, for the automated retrieval of single files for the Internet for storage on local filesystems — such as downloading source files onto servers for building applications — WGET’s syntax is the simplest to use.

However, for simple baseline testing, WGET lacks cURL’s ability to produce timing results that can be written to an output file in a user-configurable format. cURL gathers a large amount of data about a transfer that can then be used for analysis or logging purposes. This makes it a step ahead of WGET for baseline testing.

cURL Installation

For the purposes of our testing, we have used cURL 7.10.5-pre2 as it adds support for downloading and interpreting GZIP-encoded content from Web servers. Because it is a pre-release version, it is currently only available as source for compiling. The compilation was smooth, and straight-forward.

$ ./configure --with-ssl --with-zlib
$ make
$ make test

[...runs about 120 checks to ensure the application and library will work as expected..]

# make install

The application installed in /usr/local/bin on my RedHat 9.0 laptop.

Testing cURL is straight-forward as well.

$ curl http://slashdot.org/

[...many lines of streaming HTML omitted...]

Variations on this standard theme include:

  • Send output to a file instead of STDOUT
  • 	$ curl -o ~/slashdot.txt http://slashdot.org/
  • Request compressed content if the Web server supports it
  • 	$ curl --compressed http://slashdot.org/
  • Provide total byte count for downloaded HTML
  • 	$ curl -w %{size_download} http://slashdot.org/

    Baseline Testing with cURL

    With the application installed, you can now begin to design a baseline test. This methodology is NOT a replacement for true load testing, but rather a method for giving small and medium-sized businesses a sense of how well their server will perform before it is deployed into production, as well as providing a baseline for future tests. This baseline can then be used as a basis for comparing performance after configuration changes in the server environment, such as caching rule changes or adding solutions that are designed to accelerate Web performance.

    To begin, a list of URLs needs to be drawn up and agreed to as a baseline for the testing. For my purposes, I use the files from the Linux Documentation project, intermingled with a number of images. This provides the test with a variety of file sizes and file types. You could construct your own file-set out of any combination of documents/files/images you wish. However, the file-set should be large — mine runs to 2134 files.

    Once the file-set has been determined, it should be archived so that this same group can be used for future performance tests; burning it to a CD is always a safe bet.

    Next, extract the filenames to a text file so that the configuration file for the tests can be constructed. I have done this for my tests, and have it set up in a generic format so that when I construct the configuration for the next test, I simply have to change/update the URL to reflect the new target.

    The configuration of the rest of the parameters should be added to the configuration file at this point. These are all the same as the command line versions, except for the URL listing format.

  • Listing of test_config.txt
  • -A "Mozilla/4.0 (compatible; cURL 7.10.5-pre2; Linux 2.4.20)"
    -L
    -w @logformat.txt
    -D headers.txt
    -H "Pragma: no-cache"
    -H "Cache-control: no-cache"
    -H "Connection: close"
    
    url="http://www.foobar.com/1.html"
    url="http://www.foobar.com/2.png"
    [...file listing...]

    In the above example, I have set cURL to:

    • Use a custom User-Agent string
    • Follow any re-direction responses that contain a “Location:” response header
    • Dump the server response headers to headers.txt
    • Circumvent cached responses by sending the two main “no-cache” request headers
    • Close the TCP connection after each object is downloaded, overriding cURL’s default use of persistent connections
    • Format the timing and log output using the format that is described in logformat.txt

    Another command-line option that I use a lot is –compressed, which, as of cURL 7.10.5, handles both the deflate and gzip encoding of Web content, including decompression on the fly. This is great for comparing the performance improvements and bandwidth savings from compression solutions against a baseline test without compression. Network administrators may also be interested in testing the improvement that they get using proxy servers and client-side caches by inserting –proxy <proxy[:port]> into the configuration, removing the “no-cache” headers, and testing a list of popular URLs through their proxy servers.

    The logformat.txt file describes the variables that I find of interest and that I want to use for my analysis.

  • Listing of logformat.txt
  • n
    %{url_effective}t%{http_code}t%{content_type}t%{time_total}t%{time_lookup}t /
    	%{time_connect}t%{time_starttransfer}t{size_download}n
    n

    These variables are defined as:

  • url_effective: URL used to make the final request, especially when following re-directions
  • http_code: HTTP code returned by the server when delivering the final HTML page requested
  • content_type: MIME type returned in the final HTML request
  • time_total: Total time for the transfer to complete
  • time_lookup: Time from start of transfer until DNS Lookup complete
  • time_connect: Time from start of transfer until TCP connection complete
  • time_starttransfer: Time from start of transfer until data begins to be returned from the server
  • size_download: Total number of bytes transferred, excluding headers
  • As time_connect and time_starttransfer are cumulative from the beginning of the transfer, you have to do some math to come up with the actual values.

    TCP Connection Time = time_connect - time_lookup
    Time First Byte = time_starttransfer - time_connect
    Redirection Time = time_total - time_starttransfer

    If you are familiar with cURL, you may wonder why I have chosen not to write the output to a file using the -o <file> option. It appears that this option only records the output for the first file requested, even in a large list of files. I prefer to use the following command to start the test and then post-process the results using grep.

    $ curl -K test_config.txt >> output_raw_1.txt
    
    [...lines and lines of output...]
    
    $ grep -i -r "^http://www.foobar.com/.*$" output_raw_1.txt >> output_processed_1.txt

    And voila! You now have a tab delimited file you can drop into your favorite spreadsheet program to generate the necessary statistics.

    Tags: , , , , , , , , , , , , , , ,

    GrabPERF: You may experience some technical difficulties.

    May 26th, 2006 by smp | Comments | Filed in GrabPERF

    We are in the process of relocating the GrabPERF servers to new IP addresses. You may experience some weirdness as the DNS propagates, but this should die down in a couple of days.

    If you do hit an issue, you can still reach the server at http://208.66.64.70/.

    Technorati Tags:

    Tags: , , , , , , , , , ,

    Motorola.com — OFFLINE!

    April 11th, 2006 by smp | Comments | Filed in Web Performance

    Motorola made some site changes today.

    The name servers they have listed at the TLD servers are:

    ftpbox.mot.com. [129.188.136.101] [TTL=172800] [US]
    motgate.mot.com. [129.188.136.100] [TTL=172800] [US]

    The Authoritative name servers that Motorola list are:

    ftpbox.mot.com. [129.188.136.9] [TTL=59719]

    motgate.mot.com. [129.188.136.100] [TTL=59719]

    Ummm…DNS is a vital thing. Screw it up, and YOU TAKE YOURSELF OFF THE INTERNET!

    If you can get to Motorola’s site…let me know.

    Technorati Tags: ,

    Tags: , , , , , ,

    Statcounter: I can’t get to my stats — is this a problem?

    March 22nd, 2006 by smp | Comments | Filed in Web Performance

    I got a comment from someone yesterday saying that they couldn’t get to their Statcounter Web site stats. I thought it was an issue on the user end and he reported that he finally got in with another browser.

    Well, I just tried with all three of my browsers and I can’t get to the Statcounter stats that I pay for. This is somewhat distressing to me.

    And it appears to be some form of DNS issue.

    I have submitted an email to the Statcounter team and I hope to hear from them soon.

    Not a good start to my Wednesday.

    Technorati Tags: , ,

    Tags: , , , , , , , , , , ,

    DNS | Apache Virtual Host Madness Today

    February 19th, 2006 by smp | Comments | Filed in Uncategorized

    I have noticed that GrabPERF has been responding increasingly more slowly as of late. Well, I believe that I have resolved the performance issue: I moved the Web component of GrabPERF off of the machine where the database is housed.

    However, when I did this, I hit a really stupid issue that was the result of a legacy httpd.conf file directive.

    As well as GrabPERF, I moved this blog, also a heavy HTTP | PHP user, onto the same Web server. Then, once I had seen the DNS propagate, I went to this blog….and got the GrabPERF homepage!

    WTF!?!?!?!

    Turns out that I was the victim of a REALLY dense mis-configuration, which I removed from the new Web server configuration file. I had buried the NameVirtualHosts directive in a VirtualHost container, which was not part of the new server’s config file.

    Without the NameVirtualHosts directive set, the server happily responds to all incoming requests with the first VirtualHost it finds in the httpd.conf file, which in in this case was GrabPERF.

    Once I solved this, and placed the NameVirtualHosts directive outside of all of the VirtualHost containers, the server began working perfectly.

    I then went and retrofitted the secondary Web server.

    If none of this makes sense, it’s ok. I am not feeling real lucid right now.

    Technorati Tags: , , ,

    Tags: , , , , , , , , , , , , , , , , , , , ,

    The final move has occurred

    February 10th, 2006 by smp | Comments | Filed in GrabPERF, Life, Technology

    So, the move to the new datacenter is complete. We finished off the final changes last night | early this morning, and the Web server and database are now running on a big fat pipe at 365 Main in downtown San Francisco.

    How did I spring for new hardware and hot hosting? Well, I had a little help from some Friends of GrabPERF — Technorati.

    About 3 months ago, Dave Sifry contacted me when we went through our last financial crisis and offered to host the whole kit and kaboodle. He put me in contact with Adam Hertz, who turned me over to Camille Riddle.

    After driving Camille and her team nuts for two months, we switched everything over starting at about 23:30 EST (04:30 GMT) last night. There were a few hic and burps, and the DNS propagation may take a while to reach some of the most distant folks, but this morning, I issued the final “poweroff” command to my home-based Web server.

    I want to thank the Technorati team for all if their help, and I look forward to continuing to deliver quality data to the blog community.

    Niall Kennedy talks about the Technorati side here, along with a funky pic of the old GrabPERF datacenter.

    Technorati Tags: , , ,

    Tags: , , , , , , , , , , , , , , , , , , ,

    GrabPERF: Agent Location Disabled

    September 15th, 2005 by smp | Comments | Filed in smp

    This morning, I asked the ERTW.com measurement location to turn down, as we have completed testing the remote measurement code.

    This will have some effect on results going forward, mostly positive. The ERTW.com location had an unusual DNS configuration which was affecting the overall measurement statistics.

    I am still recruiting for measurement locations on the West Coast. Drop me an e-mail or leave a comment if you are interested in hosting a measurement location on your linux server.

    Tags: , , , , , , , , , ,