We have a problem. It’s 2017 and people are still using year-over-year (or week-over-week) as an accurate performance measurement after a site change.

What you should be doing is using real-time optimization techniques to validate if site changes are truly helping or hurting overall performance. What you should not be doing is using historical data as a way to determine if a change in your site ’s appearance is helping or hurting its performance. Allow me to explain.

If it’s not real-time, it’s wrong time.

Real-time optimization means testing a site change against the unchanged version (referred to as the control or the baseline). When a user visits a site, they are randomly assigned to either a changed version of the site (the treatment) or the control (the original). From there, we can track each of the performances against one another at the same point in time.

We use real-time optimization because of the many reasons that using historical data is unreliable:

  • Promotions, either currently running or residual effects
  • Each day of the week has its own associated performance. Monday is different than Tuesday which is different than Saturday, but maybe your Saturdays were different 4 months ago
  • Marketing might have brought in new, low-quality traffic (in the hopes that they eventually become high quality)
  • General demand for product, new competition, etc.

What Not to Do: Rely on past site data as a way to determine if a test or new implementation is succeeding or failing.

One of the biggest reasons that using past data is not accurate is that if your clients are constantly being exposed to new websites and customer experiences, why would you expect their preferences to remain the same? It is imperative to always make site change performance comparisons in real-time, not based on past performance.

As always, these claims are best illustrated through example, so let me walk you through a couple of scenarios:

Scenario 1: False Positive

We recently ran a simple A/B test for a client to test different aesthetics for the desktop navigation. To help illustrate the point, however, we are going to enter the world of make-believe and pretend that the client had simply implemented their idea instead of testing it out first. Then, we will use past data to determine if the site change was a success or not.

The new navigation came into effect on November 1st, 2017. One week of data gave us a conversion rate of 1.18%, which is phenomenal when compared to the year before; in 2016, the first week of November attained a conversion rate of 0.90%.

In addition to year over year, we can compare the performance against the last week in October 2017, i.e. the week before the new desktop navigation came into effect. When compared to October’s conversion rate of 0.97%, the new navigation’s conversion rate of 1.18% is still an incredible lift!

Even the trends are unexpectedly positive. When looking at expectations moving from October into November year-over-year, the new navigation appeared to have a marked positive impact on overall conversion rate.

Week to Week Lift

To interpret the table:

  • The “Rate Lift From Previous Week” column shows that from October to November 2016, conversion rate was trending downward.
  • The “Rate Lift From Previous Year” column shows that from October 2016 to October 2017, conversion rate was trending downward.

Contrary to both of these expectations of a downward trend, we are seeing a comparatively strong lift in conversion rate performance during the first week of November 2017; therefore, the implementation must have been a success!

Except… it failed.

Despite what the conversion rate lift appeared to be when using past data as a comparison, this site change was run as an A/B test, so we can use actual, accurate, real-time data to determine what the true impact on the site was.

The blue line shows the Treatment CR, while the orange line is the Control CR.

The blue line shows the Treatment CR, while the orange line is the Control CR.

 

As you can see, the treatment’s CR of 1.18%, while an incredible lift from what we would have expected based on past years, was underperforming when compared to the control’s CR of 1.21%. In just one week, that’s a loss of over $3,000 in revenue. One. Week.

Had they implemented this concept without testing, a conversion rate of 1.18% would have seemed like an increase when compared to the past data and trends of the same week the year before. It is very possible that the company would have deemed it a good implementation, patted themselves on the back, and never looked back. However, had they not implemented, the CR would have been even greater when the site was untouched (as validated by the control).

The worst part is that without testing in real-time, there would have been no way to verify this clearly negative performance, and the company would have eaten the loss.

If you use past data to determine if a site change was beneficial or not, you might as well be asking a Magic 8 Ball. Site change performance is simply unverifiable without testing in real-time.

Scenario 2: False Negative

Just like a high overall conversion rate cannot be attributed to any one cause, neither can a low conversion rate.

We were working with a client who wanted to optimize the site’s product detail pages. The test had been running for about a month and a half when the client received an e-mail from his boss expressing concern that the August overall RPV was down from this time last year. Looking at the site’s overall analytics, it was clear that she was right; the numbers were down:

The top four values are from August, 2016, while the bottom four values are from August, 2017

The top four values are from August, 2016, while the bottom four values are from August, 2017

 

Our client was embalmed in panic; he contacted us, worried that our test was bringing down site-wide revenue. What he didn’t realize was that we had been monitoring the test’s progress carefully – it was performing terrifically! All three ensembles of treatments had a positive RPV lift against the baseline, with the best-performing ensemble (F1) hovering around 20% RPV lift.

So why was the overall site performance down? We were able to determine the culprit very quickly because of our dedicated baseline.

Each blue bar is indicative of the relative RPV lift of each combination (ensemble of the two different treatments running) against the baseline (the orange line).

Each blue bar is indicative of the relative RPV lift of each combination (ensemble of the two different treatments running) against the baseline (the orange line).

 

Looking at this data table, it’s easy to ascertain that the baseline, the “clean” 10% of traffic running on the site that is representative of what the website would look like without any changes, is what had the most negative RPV throughout the month of August at $6.14. It was the normal, unchanged, site layout that was performing so negatively when compared to the year before.

This happens. Completely unrelated to testing, a website can go through periods of highs and lows for a myriad of reasons. This is exactly why it is necessary to always continue optimizing. Changes in customer preferences can lead to different (and in this case lowered) overall site metrics.

A shining light in this somewhat-sad scenario, we see that the leading ensemble, F1 alone, has an RPV of $7.44, which is higher than any of the peak points found in August of 2016. It is thanks to the multivariate optimization that the client was able to find an improved layout to encourage customers to continue buying at an even better rate than they had the year before, when the site was otherwise experiencing a decrease in performance.

If you notice a dip in site-wide performance from the previous year, don’t jump to blame the tests that are running. It’s more likely that your general performance is moving downward (due to the many potential factors), and continued optimization is one of the best ways to bring that conversion rate or revenue per visitor back up.

What to do: Make a site changes as multivariate (or A/B) optimization tests, and run them against a baseline in real-time.

In order to accurately (and statistically significantly) measure the impact of a change, we must control as many variables as possible. Time, of course, is one of those variables.

Instead of keeping track of how many Saturdays are in one month versus another, or the timeline of every promotion you’ve ever run, why not make your life 100x easier and test all site changes in real-time?

Real-time optimization allows you to test potential site changes and truly verify what helps and what hinders. Multivariate optimization is of course my favorite, as it allows you to test a large number of new layouts all at once, all being compared in real-time to the untouched baseline. In real-time, you don’t need to do any calculations and take into account bigger or smaller years – the change is either performing better or worse than the baseline at the moment of importance.

Site performance metrics are always on the up and down. You might have noticed that in the two scenarios, not only were the rates higher and lower with time, but even the raw number of visitors to the sites had increased and decreased. These are very normal trends that happen with the life cycle of sites. It is imperative to continually optimize a website to keep up with the changing demand and adaptive customer experience.

Because consumer sentiment changes lightning fast, and historical data cannot be trusted for reliable performance comparisons, you must work with the only timeframe that means anything at all: real-time.

Agree? Disagree? Let me know in the comments!