Posts Tagged ‘heavy tail’

Web Performance, Part IV: Finding The Frequency

August 30th, 2006 by smp | Comments | Filed in Web Performance, WebPerformance.Org

In the last article, I discussed the aggregated statistics used most frequently to describe a population of performance data.

stats-articles

The pros and cons of each of these aggregated values has been examined, but now we come to the largest single flaw: these values attempt to assign a single value to describe an entire population of numbers.

The only way to describe a population of numbers is to do one of two things: Display every single datapoint in the population against the time it occurred, producing a scatter plot; or display the population as a statistical distribution.

The most common type of statistical distribution used in Web performance data is the Frequency Distribution. This type of display breaks the population down into measurements of a certain value range, then graphs the results by comparing the number of results in each value container.

So, taking the same population data used in the aggregated data above, the frequency distribution looks like this.

stats-articles-frequency

This gives a deeper insight into the whole population, by displaying the whole range of measurements, including the heavy tail that occurs in many Web performance result sets. Please note that a statistical heavy tail is essentially the same as Chris Anderson’s long tail, but in statistical analysis, a heavy tail represents a non-normally distributed data set, and skews the aggregated values you try and produce from the population.

As was noted in the aggregated values, the ‘average’ performance like falls between 0.88 and 1.04 seconds. Now, when you take these values and compare them to the frequency distribution, these values make sense, as the largest concentration of measurement values falls into this range.

However, the 85th Percentile for this population is at 1.20 seconds, where there is a large secondary bulge in the frequency distribution. After that, there are measurements that trickle out into the 40-second range.

As can be seen, a single aggregated number cannot represent all of the characteristics in a population of measurements. They are good representations, but that’s all they are.

So, to wrap up this flurry of a visit through the world of statistical analysis and Web performance data, always remember the old adage: Lies, Damn Lies, and Statistics.

In the next article, I will discuss the concept of performance baselining, and how this is the basis for Web performance evolution.

Tags: , , ,

Long Tail == Heavy Tail == The Beaver Effect

December 22nd, 2004 by smp | Comments | Filed in smp

The reason why the Long Tail concept seemed so familiar to me is that I work with the statistical cousin to the marketing term, the Heavy Tail.

The term Heavy Tail is used to describe a dataset that is not "normally" (in the statistical sense; think Bell Curve) distributed. Internet performance data is notoriously heavy-tailed, with a large concentration of datapoints to the left-hand side of the population and a very slow and long/heavy tail trailing out into the nether reaches of "where things go very wrong".

When I gave training classes, I described this (being a Canadian, of course), as the Beaver Effect. If you are as puzzled as some of my seminar participants were, I am not suprised. However, go look at a picture of beaver — none posted here; you know how to use Google.

Huge Body; large tail. The Beaver Effect.

Guess it doesn’t resonate like the Long Tail.

Tags: , , , , , , , , , , , , , ,