D3N: A multi-layer cache for the rest of us

networking + cache auto-configuration

Abstract

  • big-data jobs assume high bandwidth 大数据场景需要高带宽
  • large network imbalances due to over-subscription and incremental networking upgrades 但是实际上整个网络架构并不如理想,over-subscription(比如 input 带宽加起来其实比 output 链路的能力要大很多,但是 networking 会假设 input 不容易全用满)以及网络可能不断加入新的输入(也是说 input 会越来越大 “organic growth”)
  • 给出来了一个 2-layer D3N cache (on Ceph RADOS Gateway)

Introduction

D3N = Datacenter-Data-Delivery Network

  • D3N uses high-speed storage (e.g. NVMe flash or DRAM) to cache datasets on the access side of links in a hierarchical network, dynamically allocating cache space across layers based on observed workload patterns and link speeds, so that cache capacity is preferentially used for traffic crossing the most over-subscribed links.

Questions:

  • “D3N has required no changes to the interfaces of any Ceph services, involves no additional meta-data services (e.g, to locate cached blocks), and all policies are implemented based purely on local information.” really?

Motivation

这是架构图:

本文前面提到过 D3N cache 的设计是 2-layer 的,对着架构图来说,实际上就是“intra-rack”一层,”inter-rack“一层。

为什么要这样呢?这里解释了 “differing upgrade schedules and split ownership typically prevent inter-cluster networks from being upgraded at the same time, resulting in significant bandwidth mismatches across compute clusters.” 所以除了需要 cache 来吸收请求,还需要 intra 和 inter cache size 的动态调整,来平衡请求(当然也不是万能的)。

事实上例子也印证了直觉 “the multi-level approach offers better performance than either a pure L1 or pure L2 approach for 4 cache servers or more”:

D3N Architecture

  • For simplicity and resiliency, as well as for integration in existing storage solutions, all caching and routing decision are based on local information rather than central coordination.
  • Each chunk (4 MiB) has a “home location” within the L2 cache, and L1 misses are forwarded to the chunk home location. Only in the event of a miss at the home location is a request forwarded to the data lake, the results of which are cached at both the home (i.e. L2) and client-serving (L1) locations. The L1 and L2 caches are unified: L1 requests for a chunk received at that chunk’s home location result in a single cached copy of the data.

  • Limitations

    • Local cache management: Since caching decisions are performed locally within the cache pools in each layer, D3N can make globally suboptimal caching decisions. For example, a few popular blocks can be replicated across all the L1 caches flooding the capacity and preventing caching of slightly less popular blocks.
    • Lack of fairness: Compute clusters participating in D3N share resources such as rack space, storage, power, and network bandwidth with D3N. Even though D3N tries to provide a common good by eliminating network bottlenecks, the individual benefits each cluster gets from D3N may be different and disproportionate to the resources they provide.

Implementation within Ceph

Questions:

  • Figure6 没看懂

Evaluation