quinta-feira, 26 de julho de 2012

Identifying relevant social media content leveraging information diversity and user cognition

This is another paper related to a research project that I'm working on. It studies the problem of finding relevant social media content (e.g., tweets, URLs). The research was developed by people from Rutgers University and Microsoft Research.

The main motivation for this paper is the popularity of social media applications such as Twitter. The authors argue that these kind of media are specially useful for information seeking and exploration on timely happenings. However, given the huge volume of user generated content in these social media websites, how can we identify relevant content? This is the main question studied in this paper. The large scale and high rate of growth of social media combined with the rich set of attributes (e.g., network properties, location, time) makes this question really challenging from a research perspective. One specific point that the authors make in order to motivate their research is that static structural metrics, such as PageRank and HITS, are not suitable for social media because authorities in this scenario are likely to be emergent and temporally dynamic.

The first results presented in the paper come from a small survey showing that most of Twitter users apply the Twitter search interface or external search exploration tools (e.g., Bing Social) in order to search/explore content on Twitter.

One specific aspect emphasized along the paper is the importance of diversity in social media content. In other words, in several scenarios, the users may be interested in heterogeneous information on a given topic. The example given is about the oil spill, for which a broad coverage is desirable.

Different from most of the papers on social media I've read so far. This paper is focused on more cognitive and subjective users' perceptions of the content. Therefore, the evaluation is based on user studies instead of traditional metrics such as precision and recall. In fact, the authors state that there is no ground truth information to support a large scale study of the quality metrics they are interested in. More specifically, a relevant content is defined by them as interesting, informative, engaging and better remembered. The empirical studies presented in the paper are all based on Twitter data.

To be more formal, the problem studied in the paper is based on the definition of entropy, which is a measure of the degree of uncertainty of a given variable. Moreover, the authors define the following set of content attributes for Twitter content (i.e., tweets):

  • Diffusion property (is it an RT?)
  • Responsivity nature (is it a reply?)
  • Temporal relevance (when was it posted?)
  • Thematic association (topic given by OpenCalais)
  • Authority dimension (how many followers does the author have?)
  • Geographic dimension (timezone)
  • Hub dimension (how many followees/friends does the author have?)
  • Degree of activity (how many tweets has the author produced so far?)
Given these attributes, the problem studied in the paper consists of, given a stream of tweets T, a diversity parameter w, and a sample size s, determining a tweet set T_w, such that |T_w| = s, and the diversity level (entropy of attributes) of T_w is w. In practice, a desired T_w may have a diversity level as close to w as possible. Moreover, it is important to define an order of T_w in terms of diversity.

The authors asked to a group of 11 users about the importance of each of these attributes to them. But I could not find anything interesting in such results. I believe that asking users was not a good idea because they may not be aware of the importance of the attributes for themselves. They applied the results of this survey as weights to attributes, but found that these weights does not improve the proposed technique signficantly.

The proposed solution to the problem of selecting relevant content is divided into two parts: (1) A content selection method and (2) a content organization method.

Content selection: First, a set of tweets is filtered according to the particular topic of interest. Each content (tweet) is represented as a set of attributes and it seems like the text of the tweet is not applied. Then, the following heuristics is applied in the selection of a sample of tweets:
  1. Create an empty set T_w
  2. Pick a tweet at random
  3. For each remaining tweet, compute the distortion of the entropy (l_1 norm) of the current sample in case the given tweet is added.
  4. Add the tweet that makes the entropy level closest to the desired level w
  5. In case the sample has s tweets, finish. Otherwise, go back to 3

Content organization: Tweets are ordered in terms of their distortion of the normalized entropy with respect to w.

The dataset applied in the evaluation of the proposed method contains 1.4B tweets. They considered as baselines some variations of the proposed method and also one method that always returns the most recent tweet and another that always returns the most tweeted content. The particular content of interest are URLs (but some of the attributes considered are associated to the tweet that contains such URL).


The above results show that the diversity of the returned samples are close to the desired levels (w).

The user study describe in this paper involved 67 users. Moreover, the authors considered two topics: oil spill and iphone. The evaluation criteria considered are: interestingness, informativeness, engagement, and memorableness. A set of 12 samples containg 10 tweets each was presented to users. Each user was asked the following questions:

  1. How long did you take reading the sample?
  2. How interesting is it?
  3. How diverse is it?
  4. How informative is it?

The actual time taken was compared against the perceived time as a measure of engagement. Moreover, they checked whether the users were able to remember the tweets they saw after a time interval as a measure of memorableness. The level of diversity was also varied.

The level of interaction between the variables involved in the experiments was checked using a repeated measures ANOVA procedure. The results has shown that the interactions are not significant. The following results show that the proposed method (PM) outperforms the baselines in terms of all metrics considered.


Moreover, PM generates results for which the diversity is better perceived by the users.


It can be noticed that the relationship between the actual and perceived diversity is not linear. The next result show how different quality measures are affected by diversity level.


In general, good results are achieved by low and high, but not intermediate, diversity levels. This result is intriguing.

I was not expecting much of this paper, but it end up being a good reading. I think that the approach taken to give weights to attribute was naive. Also, some of the most interesting results in the paper are not very supported. As an example, I would not expect that very low and very high diversity are appreciated by the users. However, the explanations for such fact given by the authors were weak. I believe that some of findings of this paper may be due to the particular entropy measure applied.

Link:  http://research.microsoft.com/en-us/um/people/counts/pubs/relevantsocialmedia_ht2011.pdf

Nenhum comentário:

Postar um comentário