Diversified Bandit: the k-diversified Multiple-Play Recommendation Algorithm

Date: 
March, 2017
Abstract: 
Recommender systems (RS) aim at recommending either one or several items simultaneously : the former type of RS is named single play RS, while the latter is named multiple-play RS. The efficiency of multiple-play RS can be evaluated by two different criteria: the \emph{click-through rate}, which measures the proportion of recommendations that are followed by a click; and the emph{proportion of abandonment}, which measures how many sessions which are followed by no click at all. Minimizing abandonment is at least as important as maximizing the CTR. It requires to diversify the recommendations so as to meet as often as possible one of the various expectations of the users. This paper focuses on this issue. The most efficient models from the litterature rely on machine learning models to determine the items to recommend at each time. While standard machine-learning approaches separate the learning phase (where the best model is estimated from initial interactions) from the recommending phase (where this model is applied repeatedly), the framework of \emph{multi-armed bandits} inherently mixes the exploration of the user's preferences and the exploitation of the learnt preferences. Within the framework of multi-arm bandit algorithms, we propose two variants of a new approach to address the diversification issue. The first one, that we name 2-diversified bandit algorithm, uses the conditional probability that a user clicks on an item, given that the previous item in the list was not clicked. The second one, named k-diversified bandit algorithm, is the generalization of the previous one using the conditional probabilities of an item given that all items recommended at previous positions were not clicked. We show on two reference real-world data sets that our approaches are ten times faster to reach good performance and decrease from $1.5\%$ to $6\%$ the proportion of the abandonment.