Under the Hood of the Diamanti D10’s Container Storage Platform

When it comes to containers, storage has always been a hot topic. This is primarily because configuring and managing persistent storage for stateful containers in Kubernetes hasn’t been very straightforward (and that’s putting it mildly), adding to the hefty challenges many enterprises face in building out their own container infrastructure. In realizing Diamanti’s vision for a complete, turnkey container stack, we addressed the need for a reliable means of integrating storage systems with Kubernetes and developed the FlexVolume plug-in (recently supplanted by the Container Storage Interface). FlexVolume expands policies and labels you can specify when you create a storage volume for your container. We then contributed the plug-in to the open source community.

While Diamanti handily fast-tracks Kubernetes storage configuration, the platform also delivers best-in-class container storage performance. As the focus of this blog, I’ll discuss our approach to building the Diamanti D10 platform’s storage system.

Powered By Intel® 3D NAND Flash

We built the Diamanti D10 appliance’s storage system on top of Intel® 3D NAND NVMe solid-state drives (SSD), which are optimized for data center environments with their ability to drive multi-workload efficiency. The diagram below is an “under the hood” view of the D10 appliance, and the NVMe drives are highlighted with a yellow dashed line around them.

Figure 1: An under-the-hood look at the Diamanti D10 appliance

Building a Container Storage System Fit for the Enterprise

Many of today’s scale-out storage technologies feature software-defined functionality driven by the host operating system. A major drawback of this implementation is that it consumes valuable host CPU and memory resources– on the order of 30% to 50%, when you factor in the use of features such as RAID, snapshots, replication, data compression and duplication, etc. Application performance is at risk when storage functions compete for the same compute resources.

With this in mind, our overall approach was to architect the Diamanti platform’s storage system in a way that would offload storage functions from the system CPU and RAM onto a dedicated controller. We then wanted to be able to guarantee service levels, giving users control over storage IOPS and latency on a per-container basis. Lastly, we set out to take full advantage of the raw speed and sub-millisecond latency that Intel’s® 3D NAND SSDs offer while minimizing drive wear.

Designing a storage system for a bare-metal container platform presented an immediate advantage: it was possible to use a log-structured file system, which is ultimately friendly to flash in terms of prolonging its endurance. The algorithms we’ve designed allow for us to perform nearly-sequential writes of data in 4K blocks.

By contrast, a virtualized stack doesn’t employ a log-structured file system. The hypervisor has to field and translate IO requests to specific storage drives, which takes time. As a result, writes are scattered, essentially becoming random writes.

Our approach ensures that every IO that comes in goes through a fixed path, which enables us to guarantee deterministic latency for all types of IO (reads, writes, random or sequential).

Furthermore, our system tracks which blocks are free, and which are occupied. Space management is expertly done without the need to perform garbage collection, an asynchronous process which consumes valuable IO bandwidth when it runs– typically on the order of 40% to 50% of storage system resources. Our algorithms were specially designed to avoid garbage collection in log-structured file systems, which serves to improve storage performance.

We also explicitly set out to minimize both the amount of IO cycles and the amount addressable storage consumed by metadata. We exceeded our targets, and achieved:

  • metadata consumption of less than 3% of overall addressable storage
    • For example, with 8TB SSDs, we take 17GB + 2% of total capacity, which equates to 2.2%
  • metadata reads and writes comprise less than 5% of total IOPS available to applications.

High-Performance IO

The Diamanti UI screenshot below demonstrates its fast storage performance:

Figure 2: A three-node Diamanti cluster delivers 3.2M IOPS (shown in the hero cards at the top of the screen as 1.8M Read IOPS plus 1.4M Write IOPS).

The containers shown in the table within the Diamanti UI’s “Application” view are each assigned to one of three distinct service levels: High and Medium (performance can be specified by the user), and Best Effort. Diamanti establishes isolation across these lanes. For example, newly-deployed workloads assigned to the “Best Effort” service level will have no impact on the IOPS or latency guaranteed for neighboring workloads assigned to “High” and “Medium” service levels. However, in the event of a vacancy in either of these service level bands, any excess resources are automatically shared with “Best Effort” workloads– no manual intervention needed! This is analogous to how CPU and memory resources are managed in Kubernetes.

For more info on the Diamanti bare-metal container platform, please check out the product page. Stay tuned for more container storage-related insights in our next blog!