sexta-feira, 18 de maio de 2012

Influence and Passivity in Social Media

Identifying influential users is the problem studied in this work. The authors are from Cornell University, EPFL (Switzerland), and HP Social Computing Labs, including Bernardo Huberman. The concept of passivity, which materializes the intuitive idea that some users are harder to influence than others is applied as means to provide a more accurate measurement of user influence in social media applications.

The study is based on a dataset from Twitter containing URLs posted during a 300 hours time interval ('http' was used as a keyword in the crawling process). They crawled 15M URLs and also metadata (e.g., followers and followees) about users that posted them.



According to the authors, passivity is a barrier to propagation. They show, as an evidence of passivity on Twitter, the different levels of retweet activity of users (see figure above). In other words, while some users retweet a lot, others do not retweet very often. Moreover, it is relevant to evaluate the relative influence of users on the whole network, instead of only considering direct influence (e.g., retweets). More specifically, this work is based on the following assumptions:

1. A user's influence score depends on the number of people she influences as well as their passivity.
2. A user's influence score depends on how dedicated the people she influences are.
3. A user's passivity score depends on the influence of those who she's exposed to by not influenced by.
4. A user's passivity score depends on how much she rejects other user's influence compared to everyone else.

An algorithm for computing user's influence score called IP (influence-passivity) is introduced. IP receives a graph G = (N,E,W) as input, where N is a set of nodes, E is a set of arcs, and W is a set of arc weights. The weight of an edge e = (i,j) is defined as the following ratio:

w(i,j) = influence i exerts on j / influence i attempted to exert on j

This influence can be based on several evidences, such as retweets and other co-occurrences of content in general. An acceptance rate (u_i,j and a rejection rate (v_i,j) are defined for arcs as follows:



The output of IP are the influence (I) and passivity (P) functions:


The IP algorithm solves this recursive formulation through an iterative process.


Therefore, according to IP, an influential user is someone able to get content propagated by passive users and a passive user is someone who rejects content from influential users.

They considered as baselines for IP the PageRank over a version of G where all edges are inverted, the Hirch Index (H-index), the number of followers of users, and the average number of retweets of users. Using data from Bit.ly, they evaluated how the scores were correlated with the number of clicks a URL gets. Results show that IP is better at predicting users' attention (i.e., number of clicks) then the other metrics for both graphs based on retweets and co-mentions. It is also shown that influence is not signficantly correlated with the number of followers (popularity) a user has.

In a case, study, the authors present the most influential and passive users in the dataset. The most influential user is mashable (Social Media Blogger). In general, influential users are somehow semantically right, although this kind of result is always prone to subjectivity.  Passive users are mostly aggregators and spammers (i.e., robots).

This paper is well-written and its contributions are very clear. I think the idea of passivity is interesting, but requires a deeper study. In fact, an open question is: How better is it to influence a passive than a non-passive user? In other words, let's say that one user has influenced 100 non-passive users while other has influenced 1 very passive one. What is better in practice? Also, is the number of clicks a URL gets a good measure of influence? I think it works just as the number of retweets.

Link: http://www.hpl.hp.com/research/scl/papers/influence/influence.pdf

Nenhum comentário:

Postar um comentário