Tuesday, March 13, 2007

Split Testing to Improve Your Website

Most people change their website's pages whenever they get a new idea. They think that each change is going to improve their site and make them more successful.

Of course, there are basic improvements you can make as you are writing the content for your site. And for the first few weeks you may notice some shortcomings that need to be remedied.

But, after a few days or weeks your site becomes stable. You don't find any more errors in spelling or grammar. The graphics look like they belong on the site. And your order link or opt-in form performs correctly.

You're now ready for split testing. This is a slow, incremental improvement of your site through ongoing testing.

Split testing involves making an experimental change to one of your pages, measuring the effects of that change, and analyzing the significance of differences in measured results. In other words, a split test attempts to determine how a change to your site affects some measurable response.

There are several decisions you must make to conduct a split test.

First, you need to determine what change you want to make. Typically you will change a headline, a sentence or two in your sales copy, the price of your product, the wording of your guarantee, change an image, or alter some other single feature of your page.

You will use two (or more) nearly identical pages. The difference being that one page has the "original" material while the other "experimental" page has the change applied.

Second, you must decide what "success event" to measure. For many people, it will be sales of a product or clicks of the order link. Some will want to measure the number of opt-ins. Others will measure the clicks to a pay-per-click service like AdSense.

To measure successfully, you must know how to distinguish successful responses from your "original" and "experimental" pages.

For example, many affiliate programs allow you to include a campaign ID in your link. By placing one campaign ID in the order links on the "original" page and another campaign ID on the "experimental" page you can determine the number of clicks and the number of orders coming from your pages.

Other people use a redirect script that keeps statistics on each redirect request. Redirect scripts typically use a keyword to select the URL for redirection. You can use keywords like "original" and "experimental" and have both redirected to your affiliate program's order page. Then you can use the admin function of the redirect script to look at the number of clicks to the order page from both your "original" and "experimental" pages.

Next, you'll need a script to randomly deliver your original or experimental pages to your site's visitors. It would be helpful for this script to place a cookie on the visitor’s computer so the same page is delivered when the visitor returns.

Finally, you'll need to analyze the results. The Chi Square statistic is often used to determine the significance of experiments similar to this. While differences in results often seem satisfyingly clear, they often are not statistically significant.

For example, consider two pages that are each displayed 500 times. One page resulted in 20 sales while the other page resulted in 30 sales. "WOW", you say. "One page caused 50% more sales than the other page. That's got to be meaningful."

In this case we had a total of 50 sales, all things being equal, we would expect 25 sales from each page. The difference for both of your pages was only 5 sales. One page made 5 more sales than expected while the other made 5 fewer sales than expected. This could easily be the result of random variation rather than being caused by differences in your pages.

To help you understand, consider this story. Two random people are each given 500 pennies and each is placed 30 feet from a small can. They toss their pennies at the can. One person gets 30 pennies in the can while the other gets only 20 in the can. Can you conclude that the person who got 30 pennies in was significantly more skilled at tossing pennies than the other person?

No. In fact, if this penny tossing experiment was repeated 100 times, it is likely that 15 of those results would differ by as much or more than our example. That's too close to simple random variation to believe that there is a real difference in skill levels between the people tossing the pennies.

For many experiments, a "statistically significant" result means that the differences we seen in our result would occur 5 or fewer times if a similar random experiment was repeated 100 times.

So, we should not conclude that there is a significant difference in the ability of our pages to deliver the "success event."

There are now two things we can do. One is to conclude that the change we made to our "experimental" page is not statistically significant. In this case we can move on to the next split test experiment.

Or, we can continue this split test and hope that the ratio of sales remains the same. If we carried on the split test longer and found the same ratio of results, the differences could be significant. Consider doubling the number of exposures of your pages. If the ratio held, and we now have 60 sales compared to only 40 sales, that result would be statistically significant.

In 100 truly random experiments, successes having differences similar to 60 and 40 would occur fewer than 5 times. This is a good indication that the observed differences were not caused simply by random chance. Rather, we can conclude that there was a real cause for the observed differences.

The Chi Square statistic can be found on many spreadsheets, including Excel. With this statistic you compare the expected success values with the actual success values. When the statistic has a value of 0.05 or less you can conclude that there was a real reason for the differences.

Don't expect every split test experiment to yield important results. Perhaps a third of your split test experiments will show the experimental page significantly improved sales. A third of the time there will be no significant difference. And a third of the time, the experimental page will cause a decrease in sales.

No comments: