Measuring the incremental effect of online marketing to optimize advertising investment
One of the core values at Zalando is to be Customer Obsessed, and this applies to online marketing as well. For many Zalando customers, their experience starts with a catchy ad. Therefore, in Personalized Marketing, our mission is to reach customers with a personalized message and suggest products tailored to their needs or wants.
By increasing the relevance of marketing, we aim to increase the number of customers interested in our offer, and, in turn, generate profitable sales for Zalando. While doing so, we constantly face the “never-out-of-fashion” (to quote our latest Christmas campaign) question: What does marketing really do?
Simple question, complex answers
So the central question is, “What is the true incremental effect of online marketing?”
- Would a customer have bought this pair of shoes even in the absence of marketing?
- How much are we growing Zalando’s customer base thanks to online marketing?
The answers to these questions are complex and multi-faceted. Measuring the incrementality and not mistaking a success for a failure can nearly be impossible, as shown in . Well-established and successful tech-giants are also not done answering the question. Hohnhol, O’Brien and Tang in  tried to find the best way to measure the impact of marketing beyond its short-term effect. Optimizing for the next few days or weeks may lower the impact of online marketing in the long run. Google researchers  claim “We have long recognized that optimizing for short-term revenue may be detrimental in the long term if users learn to ignore the ads.”
One of our objectives is to compute a Return-on-Investment (ROI) for every campaign. This metric allows us to allocate our resources efficiently. We maximize sales generation and new customer acquisition for every campaign, given a ROI target.
Performance measurement landscape
Zalando took up the challenge and aims to measure the performance of online marketing at scale. The ROI of our marketing activities is computed through a pipeline composed of several products. While each of them would deserve a dedicated blog post, this article aims to simply outline their purposes and main challenges.
Figure 1: Product Pipeline Overview
We have built a flexible and scalable data infrastructure based on S3, Hive and Spark on AWS. Spark’s parallelizing capabilities in combination with AWS EC2 ensure that we can meet our strict SLAs even with a continuously growing amount of customers and traffic. In the future we plan to automatically scale the size of the cluster depending on the size of the input data. We decoupled our sub-products and use Hive tables as interfaces between them. This allows for more autonomy in regards to the product development and generally lets us move faster.
At the start of our pipeline, we source all marketing clicks, sales and conversions (e.g. customer acquisitions, app installs) from Zalando’s Data Lake and DWH to build a structured and unified event data layer. This is one of our greatest challenges since the data is very diverse in regards to quality, update frequency, syntax and semantic. Therefore we are making great efforts to move from client-side towards server-side ad tracking and closely monitor our data through data quality dashboards. After updating the event data layer, we use our internal cross-device graph to create the customers’ journeys across all their devices, from first ad interaction to conversion.
Next, with our attribution model, we determine how much incremental value was created by every ad click. The particularity of the attribution problem is its unknown ground truth. As we cannot interview every single customer, we will never exactly know why a given customer bought their latest jacket on Zalando. We built a framework that allows us to iterate quickly and test many different attribution models. We are using SQL for simple transformations, while Scala is our choice for more complex computations. This way we are able to explore far beyond simple models (e.g. Last touch) and leverage our huge dataset with more complex models.
Figure 2: Attribution Illustration
As reality is unknown, we are running many randomized experiments with the aim of causally inferring the incremental impact of each marketing campaign. We use geo-based  and audience-based test methodologies to achieve this. In the former, marketing activities are turned off in certain regions and we quantify the impact on revenue, profit and customer acquisition. The latter splits a given customer base into two groups, giving one group a specific treatment and measuring the difference in behaviour. We use the results to calibrate our attribution and ensure it reflects reality.
Continuously running such a large number of parallel experiments is a great challenge. The test results need to reflect accurately the incrementality of marketing campaigns even though it can be highly affected by seasonality or ever-changing consumer behaviour. Hence, we are currently building an experimentation platform that sets experiments up, and analyze the results in an automated way.
Is That All?
The next logical step is bidding based on the ROI. We invest a lot resources to predict the performance of marketing. Every day, we estimate the incremental profit marketing campaigns will generate in the coming weeks. Each impression can lead to a click, each click can lead to a conversion. Every marketing campaign is a different time series, with its own behavior and characteristics. The magnitude between time series may vary by several orders, and while most of them are quite unique, it is possible to infer some similarities (embeddings is a solution). We are experimenting with state of the art machine learning models such as DeepAR . All of this makes it an extremely complicated and deeply interesting problem to model.
The measurement of incrementality opens up many interesting topics that we also tackle in the Personalized Marketing Team, such as generating the best ads or setting the best target and budget.
Thanks to Pablo Croppi, Carolyn Hodgson, Dirk Petzoldt, Dominik Rief for reviewing this article, and to Yanwolf Hoffmann for design help.
 Randall A. Lewis and Justin M. Rao. On the Near Impossibility of Measuring the Returns to Advertising, 2013  H. Hohnhol, D. O’Brien, D. Tang. Focusing on the Long-term: It’s Good for Users and Business  J. Vaver and J. Koehler. Measuring Ad Effectiveness Using Geo Experiments, 2011  V. Flunkert, D. Salinas, J. Gasthaus. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks