Introduction
Taboola is the world’s leading native advertising platform, serving thousands of websites with a wide variety of products including Taboola Feed, in-article ads and homepage personalization, generating hundreds of millions of impressions daily. Despite our tireless efforts to optimize performance through sophisticated features, advanced models, and robust infrastructure, navigating such complicated ecosystems inevitably reveals areas of suboptimization.
While some widgets feature ads exclusively, most display a mix of organic articles and ads (some also contain e-commerce and paywall content), as mixing organic content goes a long way in avoiding ad blindness and generating pages views
Taboola currently uses static free space allocation for organic content and ads, which does not adapt to user preferences and inventory opportunities. In this blog post, I’ll delve into our solution designed to address this challenge, explaining the obstacles encountered and the adjustments made to overcome them.
Current configuration
At the core of our strategy is a nuanced understanding of our publishers, where we deliver two core values:
- Direct monetization through ad clicks.
- Increase in pageviews (PV) driven by organic clicks.
Direct revenue through ad clicks is simple to evaluate for both Taboola and publishers, while organic clicks generate substantial revenue for the publisher through alternative monetization methods. However, this does not translate the same way for Taboola.
The balance between these two values (revenue/PV) is a unique attribute tailored for each publisher. Ideally, it should correlate with the ratio of Taboola’s cumulative revenue to other revenue streams. For example, if a publisher gets a much higher value per PV from other forms of monetization, such as subscriptions, than from a click on a Taboola ad, the inclination would be to show more organic content and minimize ad slots to achieve more subscribers Thus, organic content would be prioritized over ads in Taboola widgets.
For sponsored articles, we have two types of models: one for predicting eCTR (estimated click-through rate) and one for predicting eCVR (estimated conversion rate) and setting ad CPC (cost per click) . We calculate eCTR * CPC = eCPM for each article and rank them accordingly. For organic articles, we use only one model to predict eCTR and their output to rank them.
Unified listing solution for mixing ad space
Two main principles guide the solution:
- PVs from clicks on Taboola’s organic recommendations generate value for the publisher and therefore Taboola, so we should be able to put a price on that value.
- To find the optimal allocation for our organic ads and articles, we need to rank them together.
Combining these two principles, we came up with the idea of using an eCPC for organic articles (organic eCPC) and using this organic eCPC to calculate the organic eCPM (estimated cost per mile), which allows us to rank articles both organic like ads at the same time.
But how can we determine this organic eCPC?
Implementation
Our solution includes three steps:
- Determination of organic eCPC for each publisher.
- eCPM estimation for both organic and ad elements.
- Classification of the elements accordingly.
organic eCPC
The main challenge is determining the right organic eCPC to meet our specified goals, whether it’s hitting target revenue per day or maintaining a specific revenue-to-engagement ratio.
As a preliminary step, we decided to simulate the ranking we would have achieved using a single organic eCPC for all organic articles on one of our international websites. Analyzing the results, we found that while the ratio of organic to sponsored articles seemed reasonable, cohorts of users were only receiving organic articles. This wasn’t initially alarming, but we thought it would be a good idea to see if these users had anything in common. To our surprise, they did: the vast majority of these users were from third world countries.
This of course makes sense, as most campaigns set up in Taboola were location-based targeting. Users in different countries had different ad inventory to choose from, and items available to users in the third world mostly had low CPCs, so our organic items almost always outperformed them.
To solve this problem, we just calculated the average eCPM for each country and normalized our eCPC accordingly, allowing us to set the eCPC once for each country.
Determination of initial organic eCPC
As mentioned above, to assess the potential impact of our solution, I ran an analysis using real requests from different locations from several major publishers. We checked what the DCG (Discounted Cumulative Profit) would be for both revenue and organic clicks for different shadow bids, and ended up with a Pareto frontier showing the ratio of revenue to pageviews generated. This beat our current static status by a hundred percent for both organic clicks and revenue.
But how can we find the optimal organic eCPC for this website? Unfortunately, there is no definitive answer (unless you know what the average RPM is for all of your website’s revenue streams after clicking on an organic article). In most cases, websites choose to withhold this information from Taboola.
We had to use arbitrary proxy targets, either for the current revenue/PV ratio, keeping the same number of pageviews generated as in the static setting while maximizing revenue, etc. Regardless of the goal, we used the eCPC we got. use this value in the Pareto analysis as an estimate to base ourselves on.
controller
The open web is an ever-changing environment, and one aspect of that is that the average CPM isn’t static either. This means we need to continually update our organic eCPC for each publisher. To do this, we implemented a simple “driver”, updating the eCPC once a day based on a week’s worth of data, adjusting it according to the optimization goal. For example, if we target a specific number of clicks per user and exceed it by 5%, our calculation would be (1 – 0.05*learning_rate) * current organic eCPC.
In most cases, using the initial organic eCPC from the offline analysis, it takes a week or two to match our optimization goal.
Calibration challenges
Another major challenge we encountered was calibration. Ensuring the calibration of our organic and advertising models is paramount. Each location we support must have accurately calibrated predictions. For example, if our models predict a 10% chance of an article being clicked, that prediction should closely align with real-world results for both organic and sponsored articles. If not, we’ll end up with suboptimal rankings.
To meet this challenge, we used two key strategies. First, we’ve revamped the way we stratify data in our model, moving from stratification by editor to stratification by widget, ensuring that our model can make more accurate corrections. Second, we implemented isotonic regression [1] techniques to ensure alignment between predicted probabilities and observed outcomes for all of our models.
Online results
Measuring online impact
Traditional metrics like RPM and CTR are inappropriate in our case, as they pit engagement against revenue. Instead, we focus on clicks and revenue per user. Also, filtering out superusers and bots removes noise from the data. We evaluate the effect on both affected widgets and full page views for complete information.
Initial findings indicate a promising 8.7% increase in RC clicks per user and 10.5% increase in revenue per user. Additionally, there is a noticeable decrease in ad density, reducing the number of sponsored articles by 18%, indicating a more balanced and optimized ad space.
conclusion
In conclusion, our unified approach not only improves revenue potential, but also fosters a more personalized and engaging user experience, strengthening Taboola’s position as a leader in native advertising optimization.
Key contributions
- The unified classification approach works very well; successfully optimizes the allocation of ad space, producing significant increases in both engagement and revenue.
- Calibration is crucial; we have to consider the many different widgets on different sites with different layouts, as well as users in different countries with different average CPCs and ad inventories.
- Choosing the right metrics is essential: focusing on metrics like CR clicks per user and revenue per user allows for accurate assessments.
work to do
- Address weekly seasonality in organic eCPC: Users behave differently on weekends, which changes your average CPM.
- Implement the LinkedIn Gap Effect [2].
- Analyze the long-term effects.
- Apply this approach to different types of content, such as e-commerce and paywall pieces.
Additional information
Offline analysis
Simply put, in the offline analysis, I used our data logs to simulate what the ranking would have been given the deeper and smarter bid predictions, and summarized the estimated revenue amounts and organic clicks (eCTR * oCPC for ranking).
There are two caveats that could differentiate analysis from reality:
- I used the DCG method to divide the estimate by log(index 1 + slot); this is not precise and it would have been better to use a precise calibration factor for each slot.
- Some items may have been included that should have been blocked. I’ve removed items that were already marked as blocked, but that might not be enough.
I will add that before running the analysis, I checked the calibration of the organic and sponsored CTR predictions. Both seemed very accurate (post-multiplication), with an error of about 3% on average, allowing us to trust the accumulated metrics.
References
[1] Guo, C., Pleiss, G., Sun, Y., & Weinberger, KQ (2017, July). On the calibration of modern neural networks. In International Conference on Machine Learning (pp. 1321-1330). PMLR.
[2] Yan, J., Xu, Z., Tiwana, B., & Chatterjee, S. (August 2020). In-feed ad assignment using narrow optimization. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 3386-3394).