Reinventing SSD Architecture at Fadu

November 26, 2024

Introduction: The Need for a New SSD Architecture

The storage industry is undergoing a significant transformation. With the rise of artificial intelligence (AI) and the introduction of high-speed interfaces like PCIe 5.0 and 6.0, traditional SSD architectures are being pushed beyond their limits. These advancements necessitate a fundamental shift in the design and construction of storage solutions.

For years, the default method to improve SSD performance has been adding more powerful ARM cores. While effective to an extent, this approach has limitations: it increases power consumption, raises costs, and eventually hits diminishing returns. Moving all SSD operations to hardware automation may sound appealing, but it can limit flexibility, preventing rapid adaptation to new technologies and standards.

In this blog post, we explore how Fadu has reimagined SSD architecture to overcome these challenges, providing a scalable, flexible, high-performance, and power-efficient solution.

The Scalability Challenge in Conventional SSDs

The typical approach to scaling SSD performance involves adding more or higher-performance ARM cores. At first glance, this strategy seems logical—more cores should mean more performance. However, it introduces significant limitations:

Increased Cost and Die Size: Adding cores to keep up with PCIe bandwidth demands inflates both cost and physical footprint.

Diminishing Returns: More cores do not guarantee proportional performance gains. Data pathways can become bottlenecked by unbalanced loads, bus congestion, interrupt latency, and context switching.

Higher Power Consumption: Increasing the number or power of cores naturally leads to greater energy consumption.

Rather than solving scalability at its root, this method adds complexity, hindering overall system performance.

Typical SSD Controller Design

Traditional SSD controllers incorporate some hardware automation, but the majority of the Flash Translation Layer (FTL) and core data path reside in firmware, managed by ARM CPUs. This introduces inefficiencies, particularly with newer PCIe generations.

Source: Marvell Adding Arm cores to scale performance

For instance, a recent FMS presentation by Qingru Meng from Solidigm showed that up to half of a PCIe 5.0 SSD’s power is consumed by NAND, with greater demands for future PCIe 6.0 SSDs, up to 70%! She keenly points out that scaling Arm cores with traditional architecture would put the total SSD power far over the TDP of the modern form factors.

To meet these new performance thresholds while maintaining efficiency, SSD controllers must evolve.

The Flexibility Problem with Over-Automation

The opposite approach—full hardware automation—also presents issues. SSD firmware is inherently complex, and relying solely on hardware significantly reduces flexibility. This has several drawbacks:

Limited Firmware Development: Over-automated systems make implementing new features or fixing bugs difficult without major hardware modifications.

NAND Support Constraints: When new NAND flash types are introduced, rigid architectures struggle to provide timely support, delaying product rollouts.

Inflexibility to Adapt: Inability to adapt quickly to new protocols or standards risks obsolescence in the fast-moving SSD market.

Excessive hardware automation constrains developers, reducing the potential for innovation and adaptability—critical needs in today’s AI-driven landscape.

Architecture	Hardware Acceleration	Flexibility (NAND, new features, bugs)	Programmable control plane, pipelined	Power efficiency	Performance Scalability
Conventional SSD – many Arm cores, FW FTL	Low	High	No	Bad	Bad
RTL based HW-acceleration	High	Low	No	Good	Good
FADU architecture	High	High	Yes	Good	Good

Table 1: Comparing performance, power efficiency, and flexibility

Reinventing SSD Architecture: Fadu’s Approach

Fadu founder and CTO Peter Nam, a leading researcher in SSD technology and team pioneered a new approach to SSD architecture. The result? A programmable control plane comprising multiple tiny embedded processors to pipeline commands, combined with intelligent hardware offloading for common functions like NVMe and FTL/garbage collection. This balance ensures performance is enhanced without compromising adaptability.

Key Components of the Fadu Architecture

Four 64-bit RISC-V Cores: Responsible for firmware policy decisions and error handling, but not part of the critical data path.

NAND Channels: ONFI 5.1 NAND channels.

PCIe PHY: PCIe 5.0 x4 interface to the host.

Programmable Control Plane: Custom RTL hardware acceleration blocks paired with many small embedded processors to pipeline NVMe commands, with separate SRAM

Data Plane: Tasks in the core data path—such as error correction, data protection, and encryption—are fully hardware-accelerated.

SRAM and DRAM: SRAM serves as a buffer for high-throughput writes, while DRAM is utilized for FTL operations.

The Programmable Control Plane

A cornerstone of Fadu’s architecture is the Programmable Control Plane, powered by PPUs (Packet Processing Units): specialized RISC processors optimized for packet processing. This design introduces key advantages:

Deep Pipeline: NVMe commands are broken down and processed in a multi-stage pipeline, with each stage handled by a dedicated PPU core.

Increased Throughput: Pipelining allows for concurrent command processing at different stages, significantly boosting throughput.

Improved Efficiency: By splitting operations into smaller stages, bottlenecks can be resolved efficiently.

Rather than executing all tasks sequentially in firmware, Fadu’s architecture breaks an NVMe command into multiple pipelined stages. Each stage can be optimized independently, ensuring that only the bottlenecks are addressed, rather than overhauling the entire architecture. This yields significant power efficiency and faster adaptation across PCIe generations—from 4.0 to 6.0—without requiring major changes.

Let’s see how an NVMe read command works with the Fadu programmable control plane.

NVMe Read

⬇️ Command Fetch

⬇️ Command Decoding

⬇️ Command Chopping into FTL IO Size

⬇️ PRP Parsing

⬇️ LBA Dependency Check

⬇️ Look up L2P Info

⬇️ NAND Req. Merge (in multi-plane)

⬇️ Generate NAND Page Read command

If a bottleneck appears in a specific stage when moving to a new PCIe generation, only that stage is modified or pipelined further—providing a more efficient and faster path to market.

Extensive Hardware Offloading of Common SSD Functions

To maximize performance, Fadu has implemented extensive hardware offloading for common SSD functions requiring full RTL implementation:

FTL / Garbage Collection

End-to-End Data Protection, LDPC, RAID, and Encryption

Quality of Service Improvements

Another benefit of the Fadu architecture is improved Quality of Service (QoS). Leveraging pipelined architecture reduces tail latency and optimizes performance for mixed workloads. Extensive characterization of various NAND vendors further optimizes latency under different conditions:

Reduced Tail Latency: Critical for hyperscale and AI workloads.

Optimized Latency in Mixed Workloads: Crucial for real-world databases, such as RocksDB and Aerospike, that require high random read performance and predictable low latency.

Why Fadu’s Architecture is the Optimal Solution

Fadu’s SSD architecture redefines traditional designs by excelling in four critical areas:

Scalability: Scaled from PCIe 3.0 to 6.0 with minimal changes to core design, achieving high performance without linearly increasing CPU resources.

Flexibility: Achieves low latency and power efficiency without sacrificing the ability to support new features and adapt to changing requirements, like Flexible Data Placement

Performance: Consistently delivers best-in-class performance, scaling to PCIe 6.0 with 28GB/s and 6.9M IOPS for random reads.

Power Efficiency: Delivers best-in-class power efficiency, as shown in our previous blog post: Energy-Efficient Controller.

Conclusion

The storage industry is at a pivotal moment—old approaches can no longer meet the demands of today’s data-intensive world. It’s time to reinvent SSD architecture, not just to keep pace with technological advancements, but to set the pace.

At Fadu, we’ve embraced this challenge. Our innovative SSD architecture combines scalability and flexibility in ways that redefine the possibilities of SSD performance. By leveraging a programmable control plane with specialized PPUs and extensive hardware offloading, Fadu delivers a solution that is powerful, efficient, and adaptable for the future of storage technology.