quarta-feira, 14 de março de 2012

Trust Transitivity in Social Networks

Here goes another paper written by Physicists. This paper studies trust in a social network where trust relationships are transitive. Understanding trust is a very relevant problem in society. In particular, several web applications, such as e-markets and P2P file systems, rely on trustful users in order to provide quality service. Trust transitive plays a special role in non-centralized scenarios, i.e. when there is not a central source of trust available.

Two trust metrics based on transitivity are proposed. The first one, called best trust, is defined as follows:

s_{u,v} = max{PROD_{e_i} c_{e_{i}}}, for all {e_i} in P{u=>v}

According to this metric, the trust of u towards v is the maximum trust product of the direct trusts over the path P{u=>v}is the set of all paths between u and v, {e_i} is the set of edges in a given path, and c_e is a value of direct trust associated to an edge. This metric can be computed using a shortest path algorithm, but the authors argue that it is biased towards high values of trust, so they propose another metric, called pervasive trust:

t_{u,v}=(Sum_w A_{w,v}.(s_{u,v}^{G\{v}})^2.c_{w,v})/(Sum_w A_{w,v}.s_{u,w}^{G\{v}})

Where A is the adjacency matrix, and s_{u,v}^{G\{v}} is the best trust from u to a neighbor of v (v is excluded). In this case, several paths are considered, instead of a single one. Moreover, this metric does not tend to 0 when N approaches to infinite.


The authors give a short discussion on the Eigentrust and the trustWebRank, which are other trust metrics in the literature. These metrics are based on the concept of feedback centrality and are calculated by solving some linear system. Eigentrust requires a stochastic matrix (i.e., the sum of the trust values of the out-edges of all vertices must sum to 1) and the inferred trust values are given by the steady state distribution of the corresponding Markov chain (i.e., the left eigenvector of the stochastic matrix must have an unity eigenvalue). Therefore, the trust values are not personalized and the absolute values of trust are lost. TrustWebRank is based on the PageRank algorithm and also requires a stochastic matrix but is personalized. It also depends on a damping factor that eliminates the contribution of longer paths.

Properties of the two trust metrics presented are studied theoretically. The authors consider a random direct network with arbitrary degree distribution and direct trust values between c and c+dc given by an arbitrary distribution. In this scenario <t> = <s><c> in the limit of large network size. The discussion seems to be heavily based on the paper "Random graphs with arbitrary degree distributions and their applications" that I've not read yet. Therefore, I will not cover the math here. Basically, the authors give a formulation for <s> and compute it through numerical simulations and an analytical solution. In- and out-degree distributions following a Poisson and a Zipf are studied. The authors found a first-order transition from vanishing to positive trust values that correspond to the formation of a giant component in the network when the degree distributions are Poisson. For a Zipf distribution, the transition is of second order and the average trust is smaller than for the Poisson distribution.

Trust transitivity is also studied using data from the PGP (Pretty Good Privacy) network, which is a mechanism for users to attach a signature to the public key of another user for cryptography purposes.  The largest connected component of this network has 40k vertices. It is shown that this network does not grow in a purely random fashion because the power-law distribution of the waiting time between new keys and signatures. Degree distributions seem to follow a power-law. Moreover, the in- and out-degree of  vertices are strongly correlated and the reciprocity is very high. The degree correlation is assortative for intermediary degree values and dissortative for very high and very low degrees. Two paradigms of trust organization are studied: Authority (trust relies on a few universally trusted vertices) vs. Community-based (densely connected components provide trust to each other). The authority-based trust is more efficient but also more prone to forgery. The community-based trust has opposite characteristics. These two mechanisms seem to work together in the PGP network. In order to identify the communities, the authors a modularity maximization algorithm is applied. Two types of communities were found: one that represents the vicinity of an authority vertex and another that represents a "real" community. As means to understand trust transitivity in the PGP network, the authors assign a direct trust c=1/2 to all edges except for a fraction gamma of edges to which c=1. Three scenarios are considered: 1) Edges are selected at random (Random); 2) Edges with the largest betweenness are selected (Authority-centered); and 3) Edges with largest edge clustering are chosen (Community-centered). The authority-centered trust leads to higher values of average trust (<s> and <t>). On the other hand, the community-centered trust leads to lower values of average trust.  Moreover, for random trust distributions, vertices with large degree achieve higher <s> but have uniformly distributed <t>, as desired. For authority-based trust, both small and large degree vertices achieve high trust values. For community-based trust, vertices with intermediary degree achieve higher trust than the small and large degree ones.

This paper is very interesting and relevant. It is similar to another Physics paper I've read some weeks ago. This pattern of studying complex features of simple models seem to be a standard for these kind of paper. In the case of this particular paper, a real dataset (the PGP network) is also analyzed. I found some minor typos that were probably corrected in the "official version" of the paper (this one is from ArXiv). I think that the formulation for <s> is not clear. I will read the paper "Random graphs with arbitrary degree distributions and their applications" in the near future in order to understand the formulation in detail.

Link: http://arxiv.org/abs/1012.1358

Nenhum comentário:

Postar um comentário