How Not Picking an Experiment Winner Led to a 227% Increase in Revenue

By now most marketers are familiar with the process of experimentation, identify a hypothesis, design a test that splits the population across one or more variants and select a winning variation based on a success metric. This “winner” has a heavy responsibility – we’re assuming that it confers the improvement in revenue and conversion that we measured during the experiment.

Is this always the case? As marketers we’re often told to look at the scientific community as the gold standard for rigorous experimental methodology. But it’s informative to take a look at where even medical testing has come up short.

For years women have been chronically underrepresented in medical trials, which disproportionately favors males in the testing population. This selection bias in medical testing extends back to pre-clinical stages – the majority of drug development research being done on male-only lab animals.

And this testing bias has had real-world consequences. A 2001 report found that 80% of the FDA-approved drugs pulled from the market for “unacceptable health risks” were found to be more harmful to women than to men. In 2013 the FDA announced revised dosing recommendations of the sleep aid Ambien, after finding that women were susceptible to risks resulting from slower metabolism of the medication.

This is a specific example of the problem of external validity in experimentation which poses a risk even if a randomized experiment is conducted appropriately and it’s possible to infer cause and effect conclusions (internal validity.) If the sampled population does not represent the broader population, then those conclusions are likely to be compromised.

Although they’re unlikely to pose a life-or-death scenario, external validity threats are very real risks to marketing experimentation. That triple digit improvement you saw within the test likely won’t produce the expected return when implemented. Ensuring test validity can be a challenging and resource intensive process, fortunately however it’s possible to decouple your return from many of these external threats entirely.

The experiments that you run have to result in better decisions, and ultimately ROI. Further down we’ll look at a situation where an external validity threat in the form of a separate campaign would have invalidated the results of a traditional A/B test. In addition, I’ll show how we were able to adjust and even exploit this external factor using a predictive optimization approach which resulted in a Customer Lifetime Value (LTV) increase of almost 70%.

(more…)

B2B Marketers Should Stop A/B Testing in 2018

 

Several years ago I was hired to help fix some serious website conversion issues for a B2B SaaS client.

A year earlier the client had redesigned their website pricing page and quickly noticed a significant drop in conversions.

They estimated the redesign cost them approximately $100,000 per month.

The pricing page itself was quite standard; there were several graduating plan tiers with some self-service options and a sales contact form for the enterprise plans.

Pricing Page

Pricing Page

 

Before coming to us, they accelerated A/B testing on the page.  After investing a considerable amount in tools and new team members they had several experiments that showed an increase in the tested goals.  Unfortunately, they were not able to find any evidence that these experiments had generated long-term pipeline or revenue impact.

I advised them of our conversion optimization approach – the types of A/B tests that we could run and plan to recapture their lost conversions.  The marketing team asked lots of questions, but the VP of Marketing stayed silent.  He finally turned to me and said:

“I’m on the hook to increase enterprise pipeline and self-sign up customers by 30% this quarter.  What I really want, is a website that speaks to each customer and only shows the unique features, benefits and plans that’s best for them.  How do we do that?”

He wanted an order of magnitude better solution.  He recognized that his company had to get closer to their customers to win and he wanted a web experience that helped them get there.

I did not think they were ready for that.  I told him that we should start with better A/B testing because what he wanted to do would be very complicated, risky and expensive.

Though that felt like the right answer at the time, it is definitely not the right answer today.

Why Website Experimentation Isn’t Enough for B2B

At a time when the web is vital to almost all businesses, rigorous online experiments should be standard operating procedure.

The often cited Harvard Business review quote from the widely distributed article The Surprising Power of Online Experiments supports years of evidence from digital leaders suggesting that high velocity testing is one of the keys to business growth.

The traditional process of website experimentation involves:

  1.    Gathering Evidence. – “Let’s look at the data to see why we’re losing conversions on this page.
  1.    Forming Hypotheses. – “If we moved the plans higher up on the page we would see more conversions because visitors are not scrolling down.”
  1.    Building and Running Experiments. – “Let’s test a version with the plans higher up on the page.”
  1.    Evaluating Results to Inform the Hypothesis. – “Moving the plans up raised sign ups by 5% but didn’t increase enterprise leads.  What if we reworded the benefits of that plan?”

Every conversion optimization practitioner follows some flavor of this methodology.

Typically, within these experiments, traffic is randomly allocated to one or more variations, as well as the control experience.  Tests conclude when there is either a statistically significant change in an onsite conversion goal or the test is deemed inconclusive (which happens frequently).

If you have strong, evidence-based hypotheses and are able to experiment quickly, this will work well enough in some cases.  Over the years we have applied this approach over thousands of experiments and many clients to generate millions of dollars in return.

However, this is not enough for B2B.  Seeking statistically significant outcomes on onsite metrics often means that traditional website experimentation becomes a traffic-based exercise, not necessarily a value-based one.  While it may still be good enough for B2C sites (e.g. retail ecommerce, travel), where traffic and revenue are highly correlated, it falls apart in many B2B scenarios.

The Biggest Challenges with B2B Website Experimentation

B2B marketers currently face three main challenges with website experimentation as it is currently practiced:

  1.    It does not optimize the KPIs that matter well. – Experimentation does not easily accommodate down-funnel outcomes (revenue pipeline, LTV) or the complexity of B2B traffic and customer journey.
  1.    It is resource-intensive to do right. – Ensuring that you are generating long-term and meaningful business impact from experimentation requires more than just the ability to build and start tests.
  1.    It takes a long time to get results. – Traffic limitations, achieving statistical significance and a linear testing process makes getting results from experimentation a long process.

 

I.  KPIs That Matter

The most important outcome to optimize for is revenue.  Ideally, that is the goal we are evaluating experiments against.

In practice, many B2B demand generation marketers are not using revenue as their primary KPI (because it is shared with the sales team), so it is often qualified leads, pipeline opportunities or marketing influenced revenue instead.  In a SaaS business it should be recurring revenue (LTV).

If you cannot measure it, then you cannot optimize it.  Most testing tools were built for B2C and have real problems measuring anything that happens after a lead is created and further down the funnel, off-website or over a longer period of time.

Many companies spend a great deal of resources on optimizing onsite conversions but make too many assumptions about what happens down funnel.  Just because you generate 20% more website form fills does not mean that you are going to see 20% more deals, revenue or LTV.

You can get visibility into down funnel impact through attribution, but in my experience, it tends to be cumbersome and the analysis is done post-hoc (once the experiment is completed), as opposed to being integrated into the testing process.

If you cannot optimize for the KPIs that matter, the effort that the team puts into setting up and managing tests will likely not yield your B2B company true ROI.

Traffic Complexity and Visitor Context

Unlike most B2C, B2B websites have to contend with all sorts of different visitors across multiple dimensions and often with a long and varied customer journey.  This customer differentiation results in significantly different motivations, expectations and approaches.  Small business end-users might expect a free trial and low priced plan.  Enterprise customers often want security and support and expect to speak to sales.  Existing customers or free trial users want to know why they should upgrade or purchase a complementary product.

An added source of complexity (especially if you are targeting enterprise), is the need to market and deliver experiences to both accounts and individuals.  With over 6 decision makers involved in an enterprise deal, you must be able to speak to both the motivations of the persona/role as well as their account.

One of the easiest ways to come face-to-face with these challenges is to look at the common SaaS pricing page.

Despite my assertions several years ago, the benefits of A/B testing are going to be limited here.  You can change the names or colors of the plans or move them up the page, but ultimately, you are going to be stuck optimizing at the margin – testing hypotheses with low potential impact.

As the VP of Marketing wanted to do with us years ago, we would be better off showing the best plan, benefits and next steps to individual visitors based on their role, company and prior history.  That requires optimization based on visitor context, commonly known as website personalization.

Rules-based Website Personalization

The current standard for personalization is “rules-based” – marketers define fixed criteria (rules) for audiences and create targeted experiences for these them.  B2B audiences are often account, or individual based, such as target industries, accounts, existing customers or job functions.

Unfortunately, website personalization suffers from a lack of adoption and success in the B2B market.  67% of B2B marketers do not use website personalization technology, and only 21% of those that do are satisfied with results (vs 53% for B2C).

Looking at websites that have a major marketing automation platform and reasonably high traffic, you can see the discrepancy between those using commercial A/B testing vs Personalization:

The much higher percentage of sites that use A/B testing vs personalization, suggests that although the value of experimentation is relatively well understood, marketers have not been able to see the same value from personalization.

What accounts for this?  

Marketers who support experimentation subscribe to the idea of gathering evidence to establish causality between website experiences and business improvement.  Unfortunately, rules-based personalization makes the resource-investment and time to value challenges involved with doing this even harder.

II.  Achieving Long-term Impact from Experimentation is Hard and Resource-intensive

At a minimum, to be able to simply launch and interpret basic experiments, a testing team should have skills in UX, front-end development and analytics – and as it turns out, that is not even enough.

Testing platforms have greatly increased access for anyone to start experiments.  However, what most people do not realize is that the majority of ‘winning’ experiments are effectively worthless (80% per Qubit Research) and have no sustainable business impact. The minority that do make an impact tend to be relatively small in magnitude.

It is not uncommon for marketers to string together a series of “winning” experiments (positive, statistically significant change reported by the testing tool) and yet see no long-term impact to the overall conversion rate.  This can happen through testing errors or by simply changing business and traffic conditions.

As a result, companies with mature optimization programs will typically also need to invest heavily in statisticians and data scientists to validate and assess the long-term impact of test results.

Rules-based personalization requires even more resources to manage experimentation across multiple segments.  It is quite tedious for marketers to set up and manage audience definitions and ensure they stay relevant as data sources and traffic conditions change.

We have worked with large B2C sites with over 50 members on their optimization team.  In a high volume transactional site with homogeneous traffic, the investment can be justified.  For the B2B CMO, that is a much harder pill to swallow.

III. Experimentation Takes a Long Time

In addition to being resource intensive, getting B2B results (aka revenue) from website testing takes a long time.

In general, B2B websites have less traffic than their B2C counterparts.  Traffic does have a significant impact on the speed of your testing, however, for our purposes that is not something I am going to dwell on, as it is relatively well travelled ground.

Of course, you do things to increase traffic, but many of us sell B2B products in specific niches that are not going to have the broad reach of a consumer ecommerce site.

What is more interesting, is why we think traffic is important and the impact that has on the time to get results from testing.

You can wait weeks for significance on an onsite goal (which as I have discussed, has questionable value).  The effect that this has on our ability to generate long term outcomes, however, is profound.  By nature, A/B testing is a sequential, iterative process, which should be followed deliberately to drive learnings and results.

The consequence of all of this is that you have to wait for tests to be complete and for results to be analyzed and discussed before you have substantive evidence to inform the next hypothesis.  Of course, tests are often run in parallel, but for any given set of hypotheses it is essentially a sequential effort that requires learnings be applied linearly.

 

This inherently linear nature of testing, combined with the time it takes to produce statistically significant results and the low experiment win rate, makes actually getting meaningful results from a B2B testing program a long process.

It is also worth noting that with audience-based personalization you will be dividing traffic across segments and experiments.  This means that you will have even less traffic for each individual experiment and it will take even longer for those experiments to reach significance.

Is there Better Way to Improve B2B Website Conversions?

The short answer?  Yes.

At FunnelEnvy, we believe that with context about the visitor and an understanding of prior outcomes, we can make better decisions than with the randomized testing that websites are using today.  We can use algorithms that are learning and improving every decision tree, continuously, to achieve better results with less manual effort from our clients.

Our “experimentation 2.0” solution leverages a real-time prediction model.  Predictive models use the past to predict future outcomes based on available signals.  If you have ever used predictive lead scoring or been on a travel site and seen “there is an 80% chance this fare will increase in the next 7 days,” then you have seen prediction models in action.

In this case, what we are predicting, is the best website visitor experience that will lead to an optimal outcome.  Rather than testing populations in aggregate, we are making experience predictions on a 1:1 basis based on all of the available context and historical outcomes.  Our variation scores take into account expected conversion value as well as conversion probability, and we continuously learn from actual outcomes to improve our next predictions.

Ultimately, the quality of these predictions is based on the quality of the signals that we provide the model and the outcomes that we are tracking.  By bringing together behavioral, 1st party and 3rd party data we are building a Unified Customer Profile (UCP) for each visitor and letting the algorithm determine which attributes are relevant signals.  To ensure that our predictive model is optimizing for the most important outcomes, we incorporate Full Funnel Goal Tracking for individual (MQL, SQL) and account (opportunities, revenue, LTV) outcomes.

Example:  Box’s Homepage Experience

To see what a predictive optimization approach can do, let’s look at a hypothetical example:

Box.com has an above the fold Call to Action (CTA) that takes you to their pricing page.  This is a sensible approach when you do not have a lot of context about the visitor because from the pricing page, you can navigate to the right plan and option that is most relevant.

Of course, they are putting a lot of burden on the visitor to make a decision.  There are a total of 9 plans and 11 CTAs on that pricing page alone, and not every visitor is ready to select one – many still need to be educated on the solution.  We could almost certainly increase conversions if we made that above the fold experience more relevant to a visitor’s motivations.

SMB visitors might be ready to start the free trial once they have seen the demo, the enterprise infosec team might be interested in learning about Box’s security features first, customers who are not ready to speak to sales or sign up might benefit from the online demo, and decision makers at enterprise accounts and who are engaged might be ready to fill out the sales form.

Modifying the homepage sub-headline and CTA to accommodate these experiences could look something like the image below.  Note that they take you down completely different visitor journeys, something you would never do with a traditional A/B test.

If we had context about the visitor and historical data we could predict the highest probability experience that would lead to both onsite conversion as well as down funnel success.  The prediction would be made on a 1:1 basis as the model determines which attributes are relevant signals.

Finally, because we are automating the learning and prediction model, this would be no more difficult than adding variations to an A/B test, and far simpler and with higher precision than rules-based personalization.  The team would be alleviated from having to do the analytical heavy lifting, new variations could be added over time and changing conditions would automatically be incorporated into the model.

Conclusion

Achieving “10X” improvements in today’s very crowded B2B marketplace requires shifts in approach, process and technology.  Our ability to get closer to customers is going to depend on better experiences that you can deliver to them, which makes the rapid application of validated learnings that much more important.

“Experimentation 1.0” approaches gave human marketers the important ability to test, measure and learn, but the application of these in a B2B context raises some significant obstacles to realizing ROI.

As marketers, we should not settle for secondary indicators of success or delivering subpar experiences.  Optimizing for a download or form fill and just assuming that is going to translate into revenue is not enough anymore.  Understand your complex traffic and customer journey realities to design better experiences that maximize meaningful results, instead of trying to squeeze more out of testing button colors or hero images.

Finally, B2B marketers should no longer wait for B2C oriented experimentation platforms to adopt B2B feature sets.  “Experimentation 2.0” will overcome our human limitations to let us realize radically better results with much lower investment.

New platforms that prioritize relevant data and take advantage of machine learning at scale will alleviate the limitations of A/B testing and rules-based personalization.  Solutions built on these can augment and inform the marketers’ creative ability to engage and convert customers at a scale that manual experimentation cannot approach.

This post originally appeared on LinkedIn.

By |2018-03-07T21:03:40-08:00January 19th, 2018|Uncategorized, Experimentation|0 Comments
Go to Top