terça-feira, 14 de fevereiro de 2012

Social Network Sensors for Early Detection of Contagious Outbreaks

This is not a Computer Science paper. In fact, the authors are affiliated to Art, Medical, and Social Science schools. Therefore, it is inevitable for me to contrast the methodology applied in this paper with the traditional CS research methodology. This paper studies the selection of social sensors for the detection of epidemics. However, instead of having the whole network available, such as in the influence maximization problem, the authors considered a more restricted scenario where they have access only to friends of random individuals (or nominated friends) and compare it against a random selection of sensors. The general idea is the friendship paradox, that can be stated as follows.


Friendship Paradox: The average degree of neighbors of vertices is expected to be higher than the average degree of vertices in a network. Lets say that the degree average degree of vertices from a graph is u. Actually, the mean degree (avg) of neighbors of vertices is u + a quantity defined by the variation of the degree distribution divided by u. The proof follows:


avg = Sum_{uv in E} deg(v) / 2|E| = Sum_{v in V} deg(v)^2 / 2|E|
The degree of each vertex from the graph can be expressed as follows:
degree(v) = u + epsilon_v
Where epsilon_v gives how the degree of v deviates from the average. Replacing the degree of vertices with it, we have:
avg = Sum_{v in V} (u+epsilon_v)^2 / 2|E|
Expanding (u+epsilon_v)^2:
avg = (|V|u^2 + 2uSum_{v in V} epsilon_v + |V|Sum_{v in V} (epsilon_v) ^2) / 2|E|
Sum_{v in V} epsilon_v is 0, by definition and Sum_{v in V} (epsilon_v) ^2 is the variance (sigma^2) of the degree. Therefore:
avg = (|V|u^2 +  |V|sigma^2) / 2|E| = u + sigma^2/u

It is of common knowledge that central nodes are infected sooner during an outbreak.  However, mapping the whole network may not be feasible, what is the main motivation of this work. According to the friendship paradox, while a random sample of vertices have the same mean degree of the population, average degree of neighbors of vertices is higher. As a consequence, friends of friends are more central than random vertices.

In order to evaluate the proposed strategy, they considered a network of 744 students from Harvard during 4 months, for which 319 students are random samples of the students population and 425 are friends of the first group. Cases of influenza (formal and self-reported) among these students were tracked during the period of analysis. The incidence of influenza in the dataset is 8% and 32%, for formal and self-reported cases, respectively.

The authors show that the curve for diagnosed flu incidence in friends of friends is shifted 13.9 days earlier than in time than the curve for random nodes. The curve for self-reported cases is shifted 3.2 days earlier. They also show that their strategy enables a prediction 46 days before the peak in daily incidence in visits to the health  service for the diagnosed cases and 83 days before the peak in daily incidence in self-reported symptoms for the self-reported cases.

The study also considered how different centrality measures could help in the identification of sensors when complete information about the network is available. The metrics considered are:

degree centrality: vertices with above average degree were selected.
betweeness centrality: vertices with above average betweeness were selected.
transitivity: clustering coefficient of a vertex. Vertices with below average were selected.
coreness: degree of a vertex if all its neighbors with smaller degrees are removed. Vertices with higher-than-average coreness were considered.

The shifts achieved by sensors selected using each metric were:

degree: 5.7 days (medical), 8 days (self-reported).
betweeness: 16.5 days (medical), 22.9 days (self-reported).
k-coreness: 4.3 days (medical), 7.5 days (self-reported).
transitivity: 31.9 days (medical), 15 days (self-reported).

I pretty much liked this paper, specially because of the friendship paradox. I spent some time to prove it, but it was worthy. In general, I like the statistical rigor of these medical papers but I do not appreciate some ad-hoc decisions made (e.g., considering vertices with degree above the average as sensors). It would be interesting to consider this problem from a CS perspective, using a larger dataset and an a more extensive evaluation of the proposed strategy.

Link: www.plosone.org/doi/pone.0012948

Nenhum comentário:

Postar um comentário