This paper is co-authored by researchers from Universidade Federal de Minas Gerais and Boston University. It presents the first characterization of a live streaming workload at multiple levels. The main idea of the paper is comparing live streaming with other workloads, such as workloads from servers that stream stored objects. As a consequence, this paper supports the generation of realistic synthetic live streaming media workloads.
The authors mention two important differences between stored and live streaming media systems:
1) In live streaming, the value of the content is in its liveness. Therefore, different from stored media, rejecting new connections during a server overload is not a viable alternative.
2) Live streaming workloads are likely to exhibit stronger temporal patterns.
The workload studied was obtained from a popular live streaming media server during one month. It has two available live streaming media objects that cover a Brazilian reality show.
The hierarchical methodology applied in the paper analyzes the workload in three levels (layers):
1)
Client layer: Based on individual users;
2)
Session layer: Based on user sessions;
3)
Transfer layer: Based on transfers that compose sessions.
Based on this methodology, the paper characterizes important aspects of each layer proposed.
Client Layer
The distributions of IPs/AS, transfers/AS and transfer/country suggest a Zipf-like profile and the number of active clients can be described by an exponential distribution. The workload shows clear temporal patterns, specially daily ones. An autocorrelation analysis confirms the strength of daily patterns. Client interarrival times can be described by a Pareto distribution, it is further shown that although the arrival process is non-stationary, it can be described by a sequence of stationary Poisson arrival processes for which the average arrival rate reflects the temporal variation found. The number of sections/user also follows a Zipf-like function.
Session Layer
Session size follows a lognormal distribution and OFF times follow an exponential. The number of transfers/session is described by a Pareto distribution and arrival times within a session can be fitted by a lognormal function.
Transfer Layer
The number of concurrent transfers follow an exponential distribution and transfers present a similar temporal pattern as the one found for client activity. Transfer interarrivals can be described by two Pareto distributions, which the authors argue to be related to the existence of two classes of content, the popular and the unpopular ones. Transfer length follows a lognormal distribution, a characteristic related to traffic self-similarity and with important implications to communication performance. It is interesting to notice that in the particular case of live streaming, transfer length is a direct result of user interactions instead of object characteristics.
Representativeness of Findings
The authors applied the same methodology to another live streaming workload from a news and sports radio station. They found similar results for both workloads, except regarding the parameters and also interarrival time, which is better described by a lognormal distribution in the new workload. The authors argue that this result is due to the nature of the interactions between clients and objects in the streaming media services.
Synthesis of Media Workloads
The Gismo toolset for the generation of synthetic workload was extended in order to enable the generation of live streaming workloads using the results found.
This paper is very well-written and presents a thorough analysis of a rich workload. One aspect that should have been more discussed is the impact of the small number of objects (only 2) of the results. Modern live streaming services, such as justin.tv, have a large number of streams available, which may have important implications for the results presented in this paper.
Link: http://conferences.sigcomm.org/imc/2002/imw2002-papers/180.pdf
Nenhum comentário:
Postar um comentário