Cache Modeling and Optimization using Miniature Simulations

这些作者之前做了 SHARDS 这种适用于很多场景的工作,这一篇也是这样风格的大而全:既然不能为每一个算法建具体的数学模型,不如设计一种 low-overhead approach 可以做到什么算法都行!

Abstract

  • evaluate the effectiveness of this approach for modeling non-LRU algorithms, including ARC, LIRS and OPT
  • SLIDE performs shadow partitioning transparently within a single unified cache, avoiding the problem of migrating state between distinct caches when partition boundaries change

Questions:

  • 什么是 “problem of migrating state between distinct caches when partition boundaries change”

Introduction

  • stack algorithm (inclusion property)
    • single-pass MRC generation
    • monotonic
    • e.g. LRU, LFU, MRU
  • non-stack algorithm
    • no known single-pass MRC generation (separate runs for each cache size)
    • non-monotonic
    • e.g. 2Q, ARC, LIRS, OPT

Non-stack Algorithms

** 2Q/ARC/LIRS 都有 tunable parameter

** OPT: Belady first described OPT, the theoretically optimal algorithm, also known as MIN [4, 1, 13]. OPT is a “clairvoyant” algorithm, since it relies on knowledge of future references to evict the block that will be reused the farthest in the future. Although OPT is actually a stack algorithm, it cannot be used to implement online eviction. Instead, OPT provides a bound on the performance of realizable algorithms.

Scaled-down Modeling

Miniature Simulations

emulated cache size Se, mini-cache size Sm, and input sampled rate R, -> Se = Sm/R

In practice, it is useful to enforce reasonable constraints on the minimum mini-cache size (e.g. Sm >= 100) and sampling rate (R >= 0.001) to ensure sufficient cache space and enough sampled references to simulate meaningful behavior.

因为 sample rate R 不能确保真正 # of sampled references Ns = E[Ns] = N * R,这里认为偏差的那部分 (|Ns - E[Ns]|) 因为 the wrong proportion of frequently-accessed blocks,所以本文建议使用 # of misses / E[Ns] (x m/Ns)

不懂:

  • We have experimented with an alternative “unified” approach that integrates MRC construction into a live production cache, without running separate simulations. Spatial hashing shards requests across a set of cache partitions, all serving actual requests. Several small partitions serve as monitoring shards, emulating multiple cache sizes within a small fraction of the overall cache.
  • An MRC can be generated on demand by simply accessing the miss ratios associated with each monitoring shard. Although integrated monitoring avoids additional simulation costs, we found that it typically degrades overall cache performance slightly, since most monitoring shards will not have efficient operating points.

Adapting Cache Parameters

  • divide the input reference stream into a series of epoches
  • after each epoch, miss ratios for each mini-cache (different param) are examined, and the best parameter is applied to the actual cache

SLIDE

** SLIDE, Sharded List with Internal Differential Eviction, maintains a single unified cache, and defers partitioning decisions until eviction time.

TBD