Workload Analysis of a Large-Scale Key-Value Store

本文我比较关心 workload generation,其他部分我会挑值得记录的写一写。

Overview

  • analyze five workloads from Facebook’s Memcached deployment
  • over 284 billion requests over a period of 58 sample days
  • analyze request composition, size and rate, cache efficacy, and temporal patterns
  • propose a Model to generate synthetic workloads

Statistical Modeling

Use ETC trace, obviously not include all the nuances of the trace such as its bursty nature or the inclusion of one-off events.

  • Emulate three properties: key size, value size, and inter-arrival rate[‘the time gap in microseconds between consecutive received requests’]
  • Use Pearson correlation coefficient, find these three are independent
  • Fit distributions (such as Weibull, Gamma, Extreme Value, Normal, etc.) to each data set, and choose the one that minimizes the Kolmogorov-Smirnov distance
  • Diurnal: divide raw data into 24 hourly bins and model each

评论

本文虽然号称是对 memcached 这个 caching engine 做的 workload 分析,实际上完全不能用来生成 caching workloads,因为甚至没有 differentiate different objects,那就是 uniform distribution,那就应该做 tiering 或者直接放一些数据再也不动了。