Posts Tagged ‘statistics’

Blog Advertising: Toward a Better Model

September 18th, 2008 by smp | Comments | Filed in Blogging, The Web, advertising, social media

This week, I have been discussing the different approaches to blog analytics that can be used to determine what posts from a blog’s archive are most popular, and whether a blog is front-loaded or long-tailed. The thesis is that it’s not always what the words in the blog are that are important.

In a guest post this morning at ProBlogger, Skellie discusses how the value of social media visitors is different and inherently more complex than the value of visitors generated from traditional methods, such as search and feedreaders. Her eight points further support my ideas that the old advertising models are not the best suited for the new blogging world.

Stepping away from the existing advertising models that have been used since blogging popularity exploded in 2005 and 2006, it is clear that the new, interactive social web model requires an advertising approach that centers on community and conversation, rather than the older idea of context and aggregated readership.

The Current Model

Current blog advertising falls into two categories:

  1. Contextual Ads. This is the Google model, and is based on the ad network auctioning off keywords and phrases to advertisers for the privilege of seeing their ad links or images appear on pages that contain those words or phrases.
  2. Sponsored Ads. Once a blog is popular enough and can prove a well-developed audience, the blogger can offer to sell space on his blog to advertisers who wish to have their products, offerings or companies presented to the target audience.

In my opinion, these two approaches fail blog owners.

Contextual ads understand the content of the page, but do not understand the popularity of the page, or its relationship to the popularity of other pages in the archive.Contextual ads lack a sense of community, a sense of conversation. While the model has proven successful, it does not maximize the reach that a blog has with its own audience.

Sponsored ads understand the audience that the blog reaches, but do not account for posts that draw the readers’ attention for the longest time, both in terms of time spent reading and thinking about the post as well as over time in an historical sense. The sponsored ad model assumes that all posts get equal attention, or drive community and conversation to the same degree.

The New Model

In the new model, more effective use of visitor analytics is vital to shaping the type and value of the ads sold. Studying the visitor statistics of a blog will allow the owners to see whether the blog is, in general, front-loaded or long-tailed.

If the blog has a front-loaded audience, the most recent posts are of higher value and could be auctioned of at higher prices. In order for this to work, both the ad-hoster and the advertiser would have to agree to the value of the most recent posts using a proven and open statistical analysis methodology. In the case of front-loaded blogs, this analysis methodology would have to demonstrate that there is a higher traffic volume for posts that are between 0-3 days old (setting a hypothetical boundary on front-loading).

For blogs that are long-tailed, those posts that continue to draw consistent traffic would be valued far more highly than those that fall out into the general ebb and flow of a bloggers traffic. These posts have proven historically that they appear highly in search results and are visited often.

In addition to the posts themselves, the comment stream has to be considered. Posts that generate an active conversation are farmore valuable those that don’t. Again, showing the value of the conversation is reliant of the ability to track the numbers of people in the conversation (through Disqus or some other commenting system).

This model can be further augmented by using a tool like Lookery that helps to clearly establish the demographics of the blog audience. Being able to pinpoint not only where on a blog to advertise but also who the visitors are who view those page, provides a further selling point for this new model and helps build faith in the virtues of a blog that sells space using this new, more effectively targeted advertising pricing structure.

Now, I separate the front-loaded and long-tailed blogs as if they are distinct. Obviously these categories apply to nearly every blog as there are new posts that suddenly capture the imagination of an audience, and there are older posts that continue to provide specific information that draws a steady stream of traffic to them.

Summary

This is a very early stage idea, one that has no code or methodology to support it. However, I believe that the current contextual advertising model, one based solely on the content of the post, is not allowing the content creators and blog entities to take advantage of their most valuable resource - their own posts and the conversations that they create.

I also believe that blog owners are not taking advantage of their own best resource, Web analytics, to help determine the price for advertising of their site. Not all blog posts are created or read equally. Being able to very clearly show what drives the most eyeballs to your site is a selling point that can be used in a variable-price advertising model.

By providing tools to blog owners that intimately link the analytics they already gather and the advertising space they have to sell, a new advertising model can arise, one that is uniquely suited to the new Web. This advertising model will be founded in the concepts of conversation and community, providing more discretely targeted eyeballs to advertisers, and higher ad revenues to blog owners and content creators.

UPDATES

Appears that BuzzLogic has already started down this path. VentureBeat has commentary here.

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Blog Statistics Analysis: Page Views by Day of Week, or When to Post

September 16th, 2008 by smp | Comments | Filed in Blogging, Commentary

Since I started self-hosting this blog again on August 6 2008, I have been trying to find more ways to pull traffic toward the content that I put up. Like all bloggers, I feel that I have important things to say (at least in the area of Web performance), and ideas that should be read by as many people as possible.

As well, I have realized that if I invest some time and effort into this blog, it can be a small revenue source that could get me that much closer to my dream of a MacBook Pro.

The Analysis

In a post yesterday morning, Darren Rowse had some advice on when the best time to release new post is. Using his ideas as the framework, I pulled the data out of my own tracking database and came up with the chart below. This shows the page view data between September 1 2007 and September 15 2008 based on the day of the week vistors came to the site.

Blog Page Views by Day of Week

Using this data and the general framework that Darren subscribes to, I should be releasing my best and newest thoughts in a week on Monday and Tuesday (GMT).

After Wednesday, I should release only less in-depth articles, with a focus on commentary on news and events. And I must learn to breathe, as I suffer from an ailment all to common in bipolars: a lack of patience.

A new post doesn’t immediately find its target audience unless you have hundreds or thousands (Tens? Ones?) of readers who are influential. If you are luckyin this regard, then these folks will leave useful comments, and through their own attention, help gently show people that a new post is something they should devote their valuable attention towards.

It takes a while for any post to percolate through the intertubes. So patience you must have.

Front-loaded v Long-tailed

Unless, of course, your traffic model is completely different than a popular blogger.

The one issue that I had with Darren’s guidance is that it applies only to blogs that are front-loaded. A front-loaded blog is one that is incredibly popular, or has a devoted, active audience who help push page views toward the most recent 3-5 posts. Once the wave has crested, or the blogger has posted something new, the volume of traffic to older posts falls off exponentially, except in the few cases of profound or controversial topics.

When I analyzed my own traffic, I found that the most of my traffic volume was aimed toward posts from 2005 and 2006. In fact, more recent posts are nowhere near as popular as these older posts. In contrast to the front-loaded blog, mine is long-tailed.

There are a number of influential items in my blog which have proven staying power, which draw people from around the world. They have had deep penetration into search engines, and are relvant to some aspect of peoples’ lives that keeps pulling them back.

Summary

I would highly recommend analyzing your traffic to see it is front-loaded or long-tailed. I know that I wish that this blog  was more front-loaded, with an active community of readers and commentators. However, I am also happy to see that I have created a few sparks of content that keep people returning again and again. If your blog is  long-tailed, then when you post becomes far less relevant than ensuring the freshness and validity of those few popular posts. Ensure that these are maintained and current so that they remain relevant to as many people as possible.

Tags: , , , , , , , , , , , , , , , , , , ,

Blog Statistics Analysis - What do your visitors actually read?

September 14th, 2008 by smp | Comments | Filed in Blogging, Commentary

Steven Hodson of WinExtra posted a screenshot of his personal Wordpress stats for the last three years last night. I then posted my stats for a similar period of time, and Steven shot back with some question about traffic, and the ebbs and flows of readers.

Being the stats nut that I am, I went and pulled the data from my own tracking data, and came up with this.

Blog Posts Read Each Month, By Year Posted

I made a conscious choice to analyze what year the posts being read were posted in. I wanted to understand when people read my content, which content kept people coming back over and over again. The chart above speaks for itself: through most of the last year it’s clear that the most popular posts were made in 2005.

What is also interesting is the decreasing interest in 2007 posts as 2008 progressed. Posts from 2006 remained steady, as there are a number of posts in that year that amount to my self-help guides to Web compression, mod_gzip, mod_deflate, and Web caching for Web administrators.

This data is no surprise to me, as I posted my rants against Gutter Helmet and their installation process in 2005. Those posts are still near the top of the Google search response for term “Gutter Helmet”. And improving the performance of a Web site is of great interest to many Apache server admins and Web site designers.

What is also clear is that self-hosting my blog and the posting renaissance it has provoked has driven traffic back to my site.

So, what lessons did I learn from this data?

  1. Always remember the long tail. Every blogger wants to be relevant, on the edge, and showing that they understand current trends. The people who follow those trends are a small minority of the people who read blogs. Google and other search engines will expose them to your writings in the time of their choosing, and you may find that the three year-old post gets as much traffic as the one posted three hours ago
  2. Write often. I was in a blogging funk when my blog was at Wordpress.com. As a geek, I believe that the lack of direct control over the look and feel of my content was the cause of this. In a self-hosted environment, I feel thta I am truly the one in charge, and I can make this blog what I want.
  3. Be cautious of your fame. If your posts are front-loaded, i.e. if all your readers read posts from the month and year they are posted in, are you holding people’s long-term attention? What have you contributed to the ongoing needs of those who are outside the technical elite? What will drive them to keep coming to your site in the long run?

So, I post a challenge to other bloggers out there. My numbers are miniscule compared to the blogging elite, but I am curious to get a rough sense of how the long tail is treating you.

Tags: , , , , , , , , , , , , , , , ,

Web Performance, Part IX: Curse of the Single Metric

September 5th, 2008 by smp | Comments | Filed in Commentary, The Web, Web Performance, WebPerformance.Org, Work

While this post is aimed at Web performance, the curse of the single metric affects our everyday lives in ways that we have become oblivious to.

When you listen to a business report, the stock market indices are an aggregated metric used to represent the performance of a set group of stocks.

When you read about economic indicators, these values are the aggregated representations of complex populations of data, collected from around the country, or the world.

Sport scores are the final tally of an event, but they may not always represent how well each team performed during the match.

The problem with single metrics lies in their simplicity. When a single metric is created, it usually attempts to factor in all of the possible and relevant data to produce an aggregated value that can represent a whole population of results.

These single metrics are then portrayed as a complete representation of this complex calculation. The presentation of this single metric is usually done in such a way that their compelling simplicity is accepted as the truth, rather than as a representation of a truth.

In the area of Web performance, organizations have fallen prey to this need for the compelling single metric. The need to represent a very complex process in terms that can be quickly absorbed and understand by as large a group of people as possible.

The single metrics most commonly found in the Web performance management field are performance (end-to-end response time of the tested business process) and availability (success rate of the tested business process). These numbers are then merged and transformed by data from a number of sources (external measurements, hit counts, conversions, internal server metrics, packet loss), and this information is bubbled up in an organization. By the time senior management and decision-makers receive the Web performance results, that are likely several steps removed from the raw measurement data.

An executive will tell you that information is a blessing, but only when it speeds, rather than hinders, the decision-making process. A Web performance consultant (such as myself) will tell that basing your decisions on a single metric that has been created out of a complex population of data is madness.

So, where does the middle-ground lie between the data wonks and the senior leaders? The rest of this post is dedicated to introducing a few of the metrics that will, in a small subset of metrics, give a senior leaders better information to work from when deciding what to do next.

A great place to start this process is to examine the percentile distribution of measurement results. Percentiles are known to anyone who has children. After a visit to the pediatrician, someone will likely state that “My son/daughter is in the XXth percentile of his/her age group for height/weight/tantrums/etc”. This means that XX% of the population of children that age, as recorded by pediatricians, report values at or below the same value for this same metric.

Percentiles are great for a population of results like Web performance measurement data. Using only a small set of values, anyone can quickly see how many visitors to a site could be experiencing poor performance.

If at the median (50th percentile), the measured business process is 3.0 seconds, this means that 50% of all of the measurements looked at are being completed in 3.0 seconds or less.

If the executive then looks up to the 90th percentile and sees that it’s at 16.0 seconds, it can be quickly determined that something very bad has happened to affect the response times collected for the 40% of the population between these two points. Immediately, everyone knows that for some reason, an unacceptable number of visitors are likely experiencing degraded and unpredictable performance when they visit the site.

A suggestion for enhancing averages with percentiles is to use the 90th percentile value as a trim ceiling for the average. Then side-by-side comparisons of the untrimmed and trimmed averages can be compared. For sites with a larger number of response time outliers, the average will decrease dramatically when it is trimmed, while sites with more consistent measurement results will find their average response time is similar with and without the trimmed data.

It is also critical to examine the application’s response times and success rates throughout defined business cycles. A single response time or success rate value eliminates

  • variations by time of day
  • variations by day of week
  • variations by month
  • variations caused by advertising and marketing

An average is just an average. If at peak buiness hours, response times are 5.0 seconds slower than the average, then the average is meaningless, as business is being lost to poor performance which has been lost in the focus on the single metric.

All of these items have also fallen prey to their own curse of the single metric. All of the items discussed above aggregate the response time of the business process into a single metric. The process of purchasing items online is broken down into discrete steps, and different parts of this process likely take longer than others. And one step beyond the discrete steps are the objects and data that appear to the customer during these steps.

It is critical to isolate the performance for each step of the process to find the bottlenecks to performance. Then the components in those steps that cause the greatest response time or success rate degradations must be identified and targeted for performance improvement initiatives. If there are one or two poorly performing steps in a business process, focusing performance improvement efforts on these is critical, otherwise precious resources are being wasted in trying to fix parts of the application that are working well.

In summary, a single metric provides a sense of false confidence, the sense that the application can be counted on to deliver response times and success rates that are nearly the same as those simple, single metrics.

The average provides a middle ground, a line that says that is the approximate mid-point of the measurement population. There are measurements above and below this average, and you have to plan around the peaks and valleys, not the open plains. It is critical never to fall victim to the attractive charms that come with the curse of the single metric.

Tags: , , , , , , , , , , , , , , , , ,

Web Performance Concepts - Additional Articles

September 2nd, 2008 by smp | Comments | Filed in Web Performance, WebPerformance.Org

When I re-introduced my five articles on Web Performance Concepts last night, I had forgotten than I had already written two additional articles in the series.

  1. Web Performance, Part VI: Benchmarking Your Site
  2. Web Performance, Part VII: Reliability and Consistency

Look for Parts VII and IX in the next few days.

Tags: , , , , , , , ,

Web Performance Concepts Series - Revisited

August 31st, 2008 by smp | Comments | Filed in Commentary, Web Performance, WebPerformance.Org, Work

Two years ago I created a series of five blog articles, aimed at both business and technical readers, with the goal of explaining the basic statistical concepts and methods I use when analyzing Web performance data in my role as a Web performance consultant.

Most of these ideas were core to my thinking when I developed GrabPERF in 2005-2006, as I determined that it was vital that people not only receive Web performance measurement data for their site, but they receive it in a way that allows them to inform and shape the business and technical decisions they make on a daily basis.

While I come from a strong technical background, it is critical to be able to present the data that I work with in a manner that can be useful to all components of an organization, from the IT and technology leaders who shape the infrastructure and design of a site, to the marketing and business leaders who set out the goals for the organization and interact with customers, vendors and investors.

Providing data that helps negotiate the almost religious dichotomy that divides most organizations is crucial to providing a comprehensive Web performance solution to any organization.

These articles form the core of an ongoing series of discussion focused on the the pitfalls of Web performance analysis, and how to learn and avoid the errors others have already discovered.

The series went over like a lead balloon and this left me puzzled. While the basic information in the articles was technical and focused on the role that simple statistics play in affecting Web performance technology and business decisions inside an organization, they formed the core of what I saw as an ongoing discussion that organizations need to have to ensure that an organization moves in a single direction, with a single purpose.

I have decided reintroduce this series, dredging it from the forgotten archives of this blog, to remind business and IT teams of the importance of the Web performance data they use every day. It also serves as a guide to interpreting the numbers that arise from all the measurement methodologies that companies use, a map to extract the most critical information in the raging sea of data.

The five articles are:

  1. Web Performance, Part I: Fundamentals
  2. Web Performance, Part II: What are you calling ‘average’?
  3. Web Performance, Part III: Moving Beyond Average
  4. Web Performance, Part IV: Finding The Frequency
  5. Web Performance, Part V: Baseline Your Data

I look forward to your comments and questions on these topics.

Tags: , , , , , , , , , , , , ,

IP Registry Statistics - August 2007

August 22nd, 2007 by smp | Comments | Filed in Life

My system has a daily job to collect and aggregate the IP Blocks distributed by the five registrars into a single database, and then provide high-level WHOIS information for this data. If you want to try this yourself, the interface here.

On an extremely irregular basis, I aggregate the statistics from this data, and present it to the masses for the examination. I might actually automate this data someday!

So, for August 2007 (as of August 21, 2007), here are the aggregated IP distribution statistics broken down by registrar and country.

(more…)

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

GrabPERF: Some System Statistics

August 7th, 2006 by smp | Comments | Filed in GrabPERF

Over the last year, GrabPERF has been something that has caught the fancy of a few in the Blogging/Social Media world. It has given some perspective of how performance can affect business and image in the connected world.

But what of GrabPERF itself? It has been on a development hiatus for the last few months due to pressures from my “real” job and various trips (business and pleasure) that I have been undertaking. Over the last two weeks, I have been trying to clear out the extra measurements and focus the features and attention on the community that appears most interested in the data.

During this process, I heard back from some folks who had been using GrabPERF in stealth mode (even I can’t track all the hits!), and who asked, “Hey! Where did my data go?”. Glad to hear from all of you.

Just to give everyone some idea of the growth, here is a snapshot of aggregated daily performance and number of measurements.

GrabPERF Statistics (by day)

The number of measurements shot up, until I started culling the unused measurements. Over the last 3 weeks, average performance became extremely variable, and that’s when I began considering the culling. As well, the New York PubSub Agent appears to have gone permanently offline, as a part of their winding down process.

The fact that the system was taking 390,000 measurements per day still astounds me.

This was also comparable to the number of distinct sites we were measuring.

grabperf_stats-up-to-Aug062006-2

After the latest cull, we are down to 84 distinct tests, a level last seen on November 27, 2005.

I am pleased that the system has held together as well as it has.

Technorati Tags: ,

Tags: , , , , , , , , , , , , , , , , , , , , , , ,

StatCounter Performance Issue

March 14th, 2006 by smp | Comments | Filed in GrabPERF, Web Performance

This afternoon, StatCounter showed a marked increase in performance.

StatCounter -- Mar 14 2006

Normally I wouldn’t highlight an issue that only lasted an hour, but this appears to have been a very unusual issue that saw the page size decrease to nearly nothing, and performance shoot up to around 45 seconds. This combination usually indicates a back-end application timeout which then presents users with an error message.

StatCounter is in the GrabPERF Site Statistics Index.

Technorati Tags: , , , ,

Tags: , , , , , , , , , , , , , , , , , , , , ,

GrabPERF Site Statistics | Web Analytics Index - Mar 08 2006

March 8th, 2006 by smp | Comments | Filed in GrabPERF, Web Performance

The Site Statistics | Web Analytics Index measurements have been running now for about 2.5 days, and I wanted to make some general comments on what I am seeing.

The methodolgy for testing is straightforward. I chose sites | services that allowed you to create a free (if limited) account to track your Web visitors, and allowed you to make these statistics available to for anyone to look at. Using this this, a measurement was established against the landing page that visitors would see if they chose to look at these publicly available statistics.

I am using this blog as the placeholder for the tracking “bugs”  used in this index (see the right-hand column).

Site Stat Services Index - Mar 08 2006

From the graph above, it is clear that ShinyStat is the performance leader in this space. They have the smallest overall page size as well as the fastest and most reliable performance.

It is important to note that services such as WebTrends, Omniture, WebSideStory and Coremetrics are not included, as they are beyond the reach of most bloggers, and do not provide a public side to their data. Also, Google Analytics is not included, as they do not provide public access to the collected data.

The collected data is available in GrabPERF as both the Site Statistics Index, and as individual measurements.

Technorati Tags: , , , , , , ,

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,