MemcachedWorkload (Sigmetrics 12) 论文阅读

Workload Analysis of a Large-Scale Key-Value Store

本文我比较关心 workload generation，其他部分我会挑值得记录的写一写。

Overview

analyze five workloads from Facebook’s Memcached deployment
over 284 billion requests over a period of 58 sample days
analyze request composition, size and rate, cache efficacy, and temporal patterns
propose a Model to generate synthetic workloads

Statistical Modeling

Use ETC trace, obviously not include all the nuances of the trace such as its bursty nature or the inclusion of one-off events.

Emulate three properties: key size, value size, and inter-arrival rate[‘the time gap in microseconds between consecutive received requests’]
Use Pearson correlation coefficient, find these three are independent

Fit distributions (such as Weibull, Gamma, Extreme Value, Normal, etc.) to each data set, and choose the one that minimizes the Kolmogorov-Smirnov distance
Diurnal: divide raw data into 24 hourly bins and model each

本文虽然号称是对 memcached 这个 caching engine 做的 workload 分析，实际上完全不能用来生成 caching workloads，因为甚至没有 differentiate different objects，那就是 uniform distribution，那就应该做 tiering 或者直接放一些数据再也不动了。

Overview

Statistical Modeling

评论