quarta-feira, 16 de maio de 2012

Video Suggestion and Discovery for Youtube: Taking Random Walks Through the View Graph

I just finished reading this paper about video recommendation written by some googlers back in 2008. It seems like the authors were somehow involved with Youtube at the time, because it is used as both application scenario and main motivation. The idea of the paper is recommending videos based on a random walk through user-video (i.e., bipartite) graph.


Video recommendation is a key problem for Youtube and other video distribution systems. However, the volume of data that such systems have to deal with together with a highly skewed popularity distribution and the dynamic nature (i.e., new users and content all the time) of the interactions involved produce specific requirements, some of which are addressed in this paper. The basic idea is recommending videos based on co-views (i.e., the way in which videos are viewed by the same users). As an example, because videos 1 and 2 were co-viewed by users A and B, we may recommend video 1 to users who have watched video 2, and vice-versa.

The family of recommendation algorithms introduced by the authors are based on a general adsorption algorithm. They argue that this algorithm does not have certain undesired properties that nearest neighbor algorithms may have due to their particular distance metric (e.g., shortest path, commute time). Some of these metrics may produce potentially bad recommendations because of high-degree nodes - several paths are expected to pass through these vertices -, may be too expensive to compute or not admit incremental updates. They describe three views of this general adsorption algorithm: the first is based on averaging, the second on random walks and the last one is a linear system formulation.

Adsorption via averaging: The input graph is G = (V,E,w), where V are nodes from the bipartite graph, edges represent views and w are weights. Shadow vertices V' are created for each video in V. Moreover, we associate a probability distribution over the set of videos for each vertex. Vertices from V have no initial distribution defined. Instead, an edge from v' in V' to its respective vertex in v is inserted in the graph. The algorithm works as follows:

 
Adsorption via random walks: If we reverse the edges from G, the above algorithm is equivalent to a random walk algorithm over the bipartite graph. Shadow vertices become absorbing states. Simple as that.


Adsorption via Linear Systems: This part is very interesting. The authors formulate a version of the averaging algorithm through a linear system. Pretty cool.


The authors also propose the use of injection and dummy probabilities as means to improve the proposed recommendation model. The injection probability is basically a way to reduce the probability of reaching a shadow node corresponding to a popular video. Other properties of the video may also be considered in order to set a proper injection probability. Dummy probabilities allow a random walk to be abandoned in case a node corresponding to a popular video or a very active users are reached.

The proposed algorithms are evaluated using a dataset with 1.3 million videos and 1.1 million users from Youtube. Training and test sets (50/50%) were defined according to view time. Evaluation metrics are based on average precision and recall along ranks of videos for each user. Three baselines are considered:

Global popularity (GP): Recommends videos based on popularity;
Local popularity (LP): Recommends videos based on popularity and co-views;
Local rare (LR): Similar to LP, but weights co-views using Jaccard coefficient of the shared users.

Two variations of the proposed algorithm, both implemented as the averaging version, are evaluated:

ADSORPTION-N: Without dummy probability
ADSORPTION-D: dummy probability set as 1/4 (don't follow, is it constant?).

Results show that LP is the best baseline algorithm in general. ADSORPTION-N and ADSORPTION-D seem to achieve similar recall and precision values. However, a deeper analysis show that, in fact, ADSORPTION-D is more effective than ADSORPTION-N for more active users (the more active, the better). For users with a single view, GP makes the best recommendations. For a small number of views, LP is the best algorithm. Because the distribution of the number of views per user is skewed, these characteristics were not noticed at first.

This is a good paper. Well-defined problem, intuitive solution, conclusive experiments. The authors sometimes focused on details that are not essential to understand the paper and this particular parts are kind of boring. Also, I'm not sure whether the evaluation procedures are the standard or not.

Link: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/34407.pdf

Nenhum comentário:

Postar um comentário