Tuesday, April 14, 2009

The Math Behind Google Web Optimizer

[NOTICE: This post has been moved to the new SnapHawk Blog. Please click here to view the post in its new location.]

Google Website Optimizer works by embedding tracking codes in your website and collecting data from the website. Website Optimizer will then perform cool calculations and generate reports showing the chances of the refined page performing better than the original.

However, if you want to have multiple conversion pages for one experiment, Google will integrate the final report you will only see how the landing page is working as a whole. It will not be possible to see separate pages. On the other hand though, if you have multiple experiments for different conversion pages, you will be able to see the separate result but will not see the whole picture of how the landing page is doing.



Sometimes knowing both the separate result and integrated result would be essential to the online business. Therefore it is necessary for us to know the background calculations of Google Website Optimizer, which will enable more flexibility of the reports.

The raw data is shown on the rightmost column in Google Optimizer and those are where all calculations begin.

The estimated conversion rate is simply the conversion over the impression. The range (also known as the margin of error) will be discussed in the paragraph after the next.

The observed improvement is the easiest calculation where the conversion rate for the original page serves as a base. The difference in conversion rate between the original page and the refined page will be the numerator.

(refined conversion / refined impression) – (original conversion / original impression)

The range for the estimated conversion for a single page will be determined by the standard deviation and the total amount of impression exposed to the visitors. Statistically speaking, if all the visitors being observed have relatively similar behavior (homogeneity), say, they are converted with a very low percentage, we only need very few visitors and the range of the conversion would be relatively small. However, if those visitors display heterogeneity, we will need more observations to make sure we reach a certain confident range.

The margin of error would be calculated with the following equation:

The z score is the desired confidence level you want to reach given the conversion rate and the impression. As a common practice, we use 1.96 as the z score for a 95% confidence level, suggesting that 95% of the visitors will fall into the range or the range will be 95% correct. Please note the margin of error via such calculation is only single sided. The plus and minus sign expands the error to a two-sided range, doubles the margin of error, indicating that the conversion rate will be more or less.

The chance to beat the original is the comparison of conversion rates of the new page and the old.

First of all, we shall decide whether the visitors to the new page and to the old page have similar conversion rate (similar standard deviation). To determine whether two groups of visitors could have similar conversion rate, we would first calculate the standard deviation by using the following equation.

We then put the larger deviation over the smaller one to calculate the F value for testing. Here again, we assume a 95% level of confidence saying whether those two group have the similar standard deviation. In Excel, the function “FINV” will return the standard F value. In “FINV” function, the probability would be 0.05, limiting the chance of us making a false conclusion will only be 5%. The first degree of freedom is the impression of the page with larger standard deviation, less 1 (impression-1) and the second degree of freedom is the other impression less one. If the returned standard F value is larger than the testing F value, we could assume that they have equal variances and use the smaller standard deviation (Spool) for the calculation aftermath.

The statistical base to calculate the chance is the independent sample T-test. The calculation takes into consideration the difference between the two conversion rates, and the sample size (in this case, impression). In Excel, the function “TDIST” will help generate a T distribution for the sample.

Here is the equation of the testing value and degree of freedom in T distribution.

degree of freedom = impression A + impression B -2.
The returned chance of that T distribution is the chance to beat original.

Post by: Fangjie Xu