Overview

As a participant in the Innovation Track of Meta APAC Robyn Hackathon 2022, we aim to propose some valuable improvements in the use of Robyn, an open-sourced Marketing Mix Modeling (MMM) R-package developed by the Meta Marketing Science team. It is worth noting that our proposal comes mostly from several months of field experience employing Robyn. It appears from our experience that, so far, there exists a little gap between modeling and practice.

To understand the impact of various marketing activities on business KPIs, Robyn helps the analyst to pick the best marketing mix model and, further, recommends the corresponding optimal budget allocation. On the one hand, Robyn significantly lowers the entry barrier to building a model that is aligned with the analyst’s insights. However, on the other hand, it is challenging for the analyst to trust how accurate the estimation results might be.

Nevertheless, there is still a lack of details regarding the validation procedure and what cautions are needed in interpreting the results. To this end, we attempt to provide a battery of reasonable guidelines for validating models generated by Robyn. In addition, we highlight some parts of conventional implementations that might make analysts confused about interpreting and validating the estimation results. Solutions for those problems are provided as alternative functions and equations. Finally, we propose some additional insights that can be derived from the models and define some convenient functions that would improve the usability of Robyn.

Key Innovations

As Robyn makes one of the most novel attempts to bridge marketers’ insights and machine learning algorithms, there are not only lots of benefits but also some notable room to improve. Although an analyst may find a model that has good explainability for reality, we must answer how the analyst can guarantee the accuracy of the model.

Despite the importance of the validation procedure checking how accurately the statistical inference models reflect the past and predict the future, the currently suggested methods have their own limitations so far.

First, Robyn currently supports the robyn_refresh function that updates the previously built models with additional data, while maintaining the previously selected hyperparameters. Based on the new data, the robyn_refresh function compares the actual and predicted response; however, this comparison is not for validating the accuracy of the previous model. Rather, it is for generating a completely new model for the extended data set.

Second, Robyn currently provides most of the model explanations based on the changes in spending and the response. However, such explanations may unexpectedly result in some wrong interpretations from two perspectives: (1) the response does not fully cover the actual dependent variable, and (2) analysts might overlook the fact that the response is only related to paid media channels, not other components (e.g., seasonality, competition).

To this end, we devised two approaches to further include the aforementioned considerations.

The first approach we suggest is the “response-driven” approach, which first decomposes the dependent variable during the training period into the portion affected by paid media channels and the rest and, then, applies the predicted response changes to the former only.

$$ \hat{Y}{total} = Y{non\text{-}media} + Y_{media} \times \frac{response^{predict}}{response^{training}} $$

The second approach we suggest is the “dependent-driven” approach, which predicts the dependent variable explained by paid media channels by transforming the test input (i.e., spending on paid media channels in the test period) and multiplying it by the estimated coefficients with the same transformation rules for the training period.

$$ \hat{Y}{total} = Y{non\text{-}media} + \hat{Y}_{media}(\cdot | hyperparameters^{training}) $$

No matter which approach is taken, we can compare the predicted dependent variable with the actual values in the test period, allowing us to validate how accurate the selected models are.

Further details on the two approaches are explained in the following sections.

Key #1 Response-Driven Approach