segunda-feira, 23 de julho de 2012

The structure of online diffusion networks

This research was developed by some people who were working at Yahoo! Labs at the time. One of the authors is Duncan Watts, who is now affiliated to Microsoft Research. The main topic of the paper is viral marketing. More specifically, the authors study how much of the total adoption of an object (e.g., URL, product) is due to viral spreading. In other to support their findings, they analyze spreading processes  in  7 different scenarios.

For a long time, the research community has tried to model diffusion as an epidemic process. In particular, these disease-like models are expected to produce an "S shaped" cumulative adoption curve. Further, threshold models, in which users have minimum adoption levels to be reached before they become adopters, became popular. Moreover, network models of adoptions, where users' actions depend on a small set of neighbors and both local and global structural features could influence the size and the likelihood of cascades, attracted the interest of the research community. In particular, this models motivated several studies on the selection of a small number of nodes that could maximize the influence over the network. From an optimization perspective, other studies focused on the trade-off between the cost of activating a seed set and the global influence achieved by these nodes in the diffusion process. However, the generation of large cascades due to diffusion processes still lacks empirical evidence.

A cascade is composed by a seed node, which took action independently from other nodes, and a set of non-seed nodes influenced either directly or indirectly by the seed to take the same action. In this paper, the authors consider tree representations of cascades. The source of influence is always the earliest parent of the affected node. This study characterizes these cascades in terms of size, depth, and the proportion of the total adoption due a given isomorphic class of cascades. 

The datasets applied in this work are:

  • Yahoo! kindness: Website that asked users to describe and share their acts of kindness through Facebook, Twitter, etc. Diffusion is tracked through a unique URL given to each user. The dataset contains 59K users.
  • Zync: Plugin for watching videos with contacts in the Yahoo! IM applications. A diffusion occurs whenever an invitation to share a stream is accepted.  This dataset contains 374K users.
  • The secretary game: Online game. Players were encouraged to share their (unique) game URL. The total number of adopters is 2.9K.
  • Twitter news stories: Collection of 80K news stories posted by NYT, CNN, MSNBC, Y! News, and Huffington Post on Twitter. Adoptions means sharing a URL and the number of adoption events is 288K.
  • Twitter videos. 540K Youtube videos posted on Twitter. An adoption occur whenever a user posts a link to a video. There are 1.3M adoption events.
  • Friendsense: Third-party Facebook application that asked users about their political views and also their beliefs about their friends' political views. The dataset contains 2.5K users, 100K answers and 80 questions.
  • Yahoo! Voice: Paid VoIP service supported by Yahoo! Messenger. An adopter is someone who bought Y! Voice credits and cascades are identified according to the interactions in the Yahoo! IM network. The dataset is composed by 1.8M users.


Results show that, in general, cascade trees are small. The most frequent cascade size is 1 and cascade size distribution is very skewed. The authors investigate whether a small number of large cascades would create a significant proportion of the adoptions and find that this is not the case. Less than 10% of the adoptions occur in cascades with more than 10 nodes. Moreover, the vast majority of adoptions occur within 1 generation of the seed node.

Based on this results, the authors make several interesting points about the impact of their discoveries over our understanding of viral marketing. In fact, they challenge the common belief that a small set of users may produce large cascades in real scenarios. On the other hand, they also discuss how different results could be found in case adoptions were associated to explicit incentives or in scenarios where the adoption is automatic, such as in email viruses. The following figure shows some of the largest cascades found in the Twitter datasets. Colors represent cascade levels (i.e., distance from the seed node).


This paper is very well-written and has a very clear contribution. I appreciated the literature review, which gave a historic view on information diffusion research. However, the paper lacks a further step in the direction of alternative models able to capture the main properties of adoption cascades found. In fact, it seems that this further step was left as a future work. Let's wait and see what comes next.

Link: http://5harad.com/papers/diffusion.pdf

Nenhum comentário:

Postar um comentário