Measuring uplift using control experiments
In this series we look at the experiments we are conducting to try and understand the value of post-view conversions
What is uplift?
In this context, uplift is a measurement of the proportion of conversions that only happened because of a particular advertising activity compared to those where a user has been served an ad, but would have converted regardless. This is particularly important for post view conversions and for Retargeting. Advertisers always ask: “Surely some of these conversions would have happened anyway?” And they are completely correct, some would have happened anyway. The key question is, how many would have happened without advertising activity, and how many happened because of the advertising? Uplift is a way of measuring this.
Why measure uplift?
Currently, many advertisers set key targets and goals based on either CPA or ROI. Most also use a last click or last touch point attribution method. Affiliate programs, for example, will only pay out based on a sale. The advertiser is happy to pay a certain amount for each sale generated as it will make a profit from the sale. This all sounds great, but like we have said before, the rules of the game determine how the game is being played. Buyers don’t necessarily have to generate sales, merely find sales. One tactic used is to try and drop cookies as many times as possible as cheaply as possible. Then credit can be given, regardless of whether that ad was actually influential in the user’s journey, or even if the ad was actually seen at all. This methodology exacerbates the fraud problem because they reward cheap, worthless, and nefarious publishers for their ability to inexpensively deliver a tracking cookie.
The solution to this is to change the attribution system to measure uplift. A movement to true incremental lift measurement will eliminate bad players because the bogus impressions will not produce any uplift, so we can optimise away from them. This is also true of poor inventory where the ads may not be seen, or are not seen for very long.
Xelsion’s approach
How to experiment
To measure uplift, Xelsion uses Randomised Controlled Trials (RCT), also called Control Experiments. These experiments are designed to isolate the effect of one variable on a system by holding constant all variables except the one under observation. To do so, users are randomly split into 2 groups. One group will receive ads for the brand that is being tested (Brand Ads) and the other group will receive ads that are completely unrelated, such as PSAs. This is the control group, and we call these ads the Control Ads.
Running such an experiment is not simple, as it is absolutely essential that each group is treated exactly the same and that they are kept completely separate with no other advertising for the brand occurring for either group. Other examples of potential difficulties include: if the brand has some sensitive attributes, e.g. gambling ads, then some publishers may not serve those ads which creates an inequality problem. Also, we cannot use any optimisation engines for this experiment as this would mean different bids between the two groups. In addition, we must be careful that we collect enough data to make considered conclusions rather than jumping to conclusions too early.
Recent Results
Our most recent experiment was focused on Retargeting visitors to a popular e-commerce website who had visited the website less than 1 day ago. The brand ads had a conversion rate of 0.370%, whereas the control ads has a conversion rate of 0.255%. This means that the uplift generated by our advertising was 44% over what would have happened anyway, and so 31% of Sales that the Brand Ads saw only happened because of that ad. Therefore we can attribute 31% of post view sales to our activity.
As in any random process, there is some uncertainty. A simplistic 90% confidence interval tells us that the uplift was in the range of 20%-72%. This range is quite wide, but we can narrow it down by using data for other metrics such as the number of Basket conversions. From this we estimate a confidence interval between 26% and 56% uplift. This is still quite wide and could be narrowed further by running for longer. However in our case we are continuing with slightly different experiments (e.g. users who visited 7 – 14 days ago rather than less than 1 day ago), and we can use all the experiments together to narrow the interval further. This maximises learning across a number of areas without spending too much on the experimentation.
How can we use this result?
It is fair to say that no one really knows what the outcome will be until this sort of experiment is tried. The results depend on so many factors, for instance type of targeting, creative, viewability settings, brand awareness, size of brand, vertical, that results from one advertiser rarely translate into similar results for another advertiser. For example in this case the conversion rates are quite high, indicating a popular brand that people would be using anyway. This is also true because we are retargeting. Therefore we expect the uplift figures to be relatively low.
By running this experiment, it:
Keep looking back for more updates on ongoing experiments and how we use the knowledge these experiments give us to drive additional value for our advertisers.