GL-Cache: Group-level learning for efficient and high-performance caching

Introduction

“learned caches”: employed machine learning to improve cache evictions

  • object-level learning (e.g. LRB): learns the next access time for each object using dozens of object features and evicts the object with the furthest predicted request time
    • significant computation and storage overheads
    • Zipf distributions “most objects only get a limited number of requests. This leads to limited object-level information for learning”
  • learning-from-distribution (e.g. LHD): models request probability distributions to inform eviction decisions
    • 必须 random sample,利用的信息有限,绝大部分 object 在 distribution 的尾部、信息很少
  • learning-from-simple-experts (e.g. LeCaR and Cacheus): performs evictions by choosing eviction candidates recommended by experts (e.g., LRU and LFU), and updates experts’ weights based on their past performance on the workload
    • A delay exists between a bad eviction and an update on the expert’s weight (“delayed rewards” in reinforcement learning).

Group-level Learned Cache:

  • cluster similar objects into groups using write time
  • evict the least useful groups using a merge-based eviction
  • introduce a group utility function to rank groups

Background and motivation

Questions:

  • “Object-level learning needs to sample objects and perform inference at each write (eviction). For example, LRB samples 32 objects and copies their features to a matrix for inference for each eviction. In our measurement, each evic- tion (including feature copy, inference, and ranking) takes 200 μs on one CPU core, indicating that the cache can evict at most 5,000 objects on a single core per second.” 200 us 这实验是怎么做的,这数字怎么这么大

GL-Cache: Group-level learned cache

  • Grouping by write time
    • PS: allow an efficient implementation using a log-structured cache
  • Learning object-group utility
    • gradient boosting machines (GBM) because tree models do not require feature normalization
    • formulate the learning task as a regression problem that minimizes the mean square loss (L2) of object-group utilities
    • A sampled group may be evicted before being used for training. Such evictions halt the tracking of group utility. GL-Cache keeps ghost entries for objects which have not been factored into group utility.
    • 一次扔掉最差的 group 附近的 \(N_{group}\) 个 group,这个参数可能设置成整体的 1% 之类的

Questions:

  • 如果有好几个 application 同时写入,凭什么说有相似的 write-time 的 object 相似呢?
  • “GL-Cache generates new training data by sampling cached object groups, and it copies the features of the sampled groups into a pre-allocated memory region” 这不是也有 random sampling (v.s. learning-from-distribution)?

Evaluation

放两个有意思的图: