Efficient MRC Construction with SHARDS

简单来说就是 hash key (所谓 spatially hash) 并且用很小的一部分 (e.g. <1%) 可以达到很接近完整计算的结果。

我在乎的是这种 hashing 的策略(可能如 lecture Q&A 所说再加一点 temporal hashing) 是不是可以推广到更广泛的问题;以及具体实现的时候有遇到些什么考量。

Abstract

SHARDS = Spatially Hashed Approximate Reuse Distance Sampling

Introduction

SHARDS Sampling Algorithm

如果 fixed-rate construction,非常简单,固定 R 即可;

如果 fixed-sized construction,如图里蓝色框所示,只需要随着新的内容进来,不断降低 R(当然得清除高于 R 的记录),并且对记录 rescale (1/R),注意,rescale 不止对 x 轴,也对 y 轴。

Design and Implementation

不懂:

Experimental Evaluation

  • Spatial Hash 得到的并不一定刚好是 1/R 的记录,缺失的记录被认为是热的,所以 counter 加在了第一个 bucket:”Ideally, the average number of repetitions per block should be the same for both the sample set and the complete trace. This happens when the actual number of sampled references, Ns, matches the expected number, E[Ns] = N * R. When this does not occur, we find that it is because the sample set contains the wrong proportion of frequently accessed blocks. Our correction simply adds the difference, E [Ns ] − Ns, to the first histogram bucket before computing final miss ratios.”
  • 也可以 simulate more sophisticated policies, such as LIRS, ARC, CAR, or Clock-Pro

评论

快速过实验结果可以看 SHARDS slides

虽然思想很简单,但是效果惊人,感觉也可以用在优化 OSCA 这样的算法到 practical