terça-feira, 6 de março de 2012

The Role of Social Networks in Information Diffusion

This paper was accepted in the next WWW conference, to take place in Lyon, France. Its authors are from Facebook and University of Michigan (Lada Adamic). The idea of the paper is covering a gap in the study of social influence and information propagation in social networks, which is the fact that most of the existing studies do not distinguish influence/propagation from other sources of correlation (e.g., homophily). Although this problem has been known for years, previous studies had no access to a large dataset that could enable this kind of analysis. Thanks to Facebook, these guys performed an experiment in order to quantify the effect of social relationships in the diffusion of information.

Given a pair of users (A,B) that are friends, the researchers considered two scenarios in which A posts a URL on Facebook: (1) the URL does not appear in B's news feed, and (2) the URL appears in B's news feed. For a large set of pairs subject-URL selected at time of display, the researchers were able to randomly select whom were going to be exposed to the URL and whom were not going not. The number of subjects, URLs and subject-URL pairs considered were 253M, 75M, and 1B, respectively. The collection period ranged from August 14th to October 4th.

Social exposure x diffusion: In order to account for the possible effect of bias towards frequent URLs or active users, the authors considered the bootstrap average by URL to compute the likelihood of sharing for the feed and no feed group. Bootstrapping is a statistical technique for assessing the variation in estimate based on sampling. They found that The likelihood of sharing is 0.191% for the feed and 0.025% for the no feed group. Individuals in the feed condition are 7.37 times more likely to share.

Temporal clustering: Subjects in the feed group share the information sooner than those from the no feed one. The median sharing latency is 6 hours for the feed and 20 hours for the no feed group.

Multiple sharing friends: One important aspect in the literature on social influence and information diffusion is the effect of the number of active/affected/sharing friends.  The authors found that sharing probability increases with the number of sharing friends in both the feed and no feed groups. However, the ratio between the shared probability  in the feed and no feed group decreases with the number of sharing friends.

Tie strength: The frequency of private communication, frequency of public online interaction, number of appearances of users in the same photo, and the number of times users comment on the same posts were considered as tie strength measures. Data from three month prior to the experiment was applied in this evaluation. The goal is to bring back the concepts of strong and weak ties, from a seminal work developed in 1973 called The Strength of Weak Ties, in which was found that weak ties play an important role in social networks. The results presented in this paper also showed that the strength of ties is positively correlated with the probability of sharing for the feed and no feed conditions, but this effect diminishes with tie strength.

Collective impact of ties: On the same topic of the previous section, here the authors compute the effect of exposure to information shared by friends with strength k, which is computed by the following equation:

ATET(k) = p(k,feed) - p(k,n feed)

Where p is the probability of sharing. This value is multiplied by f(k), which is the fraction of links displayed in all users' feeds posted by friends with tie strength k. Ties for which k = 0 were classified as weak and those with k > 0 were classified as strong. The results show that most of the diffusion occurs through weak ties.

This paper is not bad. Handling a dataset like the one used in this paper is an once in a lifetime opportunity for an ordinary researcher. I'm just a little bit disappointed because when I read the tittle and the list of authors, I pictured a better paper in my head. I was expecting a more rich statistical analysis, a bunch of cool visualizations, and also some kind of influence/propagation model based on the data. I did not buy the sections about tie strength, which were probably intended to be the high point of the paper.

Link: http://arxiv.org/abs/1201.4145

Nenhum comentário:

Postar um comentário