Introduction: The Need for a New SSD Architecture
The storage industry is undergoing a significant transformation. With the rise of artificial intelligence (AI) and the introduction of high-speed interfaces like PCIe 5.0 and 6.0, traditional SSD architectures are being pushed beyond their limits. These advancements necessitate a fundamental shift in the design and construction of storage solutions.
For years, the default method to improve SSD performance has been adding more powerful ARM cores. While effective to an extent, this approach has limitations: it increases power consumption, raises costs, and eventually hits diminishing returns. Moving all SSD operations to hardware automation may sound appealing, but it can limit flexibility, preventing rapid adaptation to new technologies and standards.
In this blog post, we explore how Fadu has reimagined SSD architecture to overcome these challenges, providing a scalable, flexible, high-performance, and power-efficient solution.
The Scalability Challenge in Conventional SSDs
The typical approach to scaling SSD performance involves adding more or higher-performance ARM cores. At first glance, this strategy seems logical—more cores should mean more performance. However, it introduces significant limitations:
- Increased Cost and Die Size: Adding cores to keep up with PCIe bandwidth demands inflates both cost and physical footprint.
- Diminishing Returns: More cores do not guarantee proportional performance gains. Data pathways can become bottlenecked by unbalanced loads, bus congestion, interrupt latency, and context switching.
- Higher Power Consumption: Increasing the number or power of cores naturally leads to greater energy consumption.
Rather than solving scalability at its root, this method adds complexity, hindering overall system performance.
Typical SSD Controller Design
Traditional SSD controllers incorporate some hardware automation, but the majority of the Flash Translation Layer (FTL) and core data path reside in firmware, managed by ARM CPUs. This introduces inefficiencies, particularly with newer PCIe generations.
Source: Marvell Adding Arm cores to scale performance
For instance, a recent FMS presentation by Qingru Meng from Solidigm showed that up to half of a PCIe 5.0 SSD’s power is consumed by NAND, with greater demands for future PCIe 6.0 SSDs, up to 70%! She keenly points out that scaling Arm cores with traditional architecture would put the total SSD power far over the TDP of the modern form factors.
Source: SSD Controller Scalability to PCIe Gen6 and Beyond
To meet these new performance thresholds while maintaining efficiency, SSD controllers must evolve.
The Flexibility Problem with Over-Automation
The opposite approach—full hardware automation—also presents issues. SSD firmware is inherently complex, and relying solely on hardware significantly reduces flexibility. This has several drawbacks:
- Limited Firmware Development: Over-automated systems make implementing new features or fixing bugs difficult without major hardware modifications.
- NAND Support Constraints: When new NAND flash types are introduced, rigid architectures struggle to provide timely support, delaying product rollouts.
- Inflexibility to Adapt: Inability to adapt quickly to new protocols or standards risks obsolescence in the fast-moving SSD market.
Excessive hardware automation constrains developers, reducing the potential for innovation and adaptability—critical needs in today’s AI-driven landscape.
Architecture | Hardware Acceleration | Flexibility (NAND, new features, bugs) | Programmable control plane, pipelined | Power efficiency | Performance Scalability |
---|---|---|---|---|---|
Conventional SSD – many Arm cores, FW FTL | Low | High | No | Bad | Bad |
RTL based HW-acceleration | High | Low | No | Good | Good |
FADU architecture | High | High | Yes | Good | Good |
Table 1: Comparing performance, power efficiency, and flexibility
Reinventing SSD Architecture: Fadu’s Approach
Fadu founder and CTO Peter Nam, a leading researcher in SSD technology and team pioneered a new approach to SSD architecture. The result? A programmable control plane comprising multiple tiny embedded processors to pipeline commands, combined with intelligent hardware offloading for common functions like NVMe and FTL/garbage collection. This balance ensures performance is enhanced without compromising adaptability.
Key Components of the Fadu Architecture
- Four 64-bit RISC-V Cores: Responsible for firmware policy decisions and error handling, but not part of the critical data path.
- NAND Channels: ONFI 5.1 NAND channels.
- PCIe PHY: PCIe 5.0 x4 interface to the host.
- Programmable Control Plane: Custom RTL hardware acceleration blocks paired with many small embedded processors to pipeline NVMe commands, with separate SRAM
- Data Plane: Tasks in the core data path—such as error correction, data protection, and encryption—are fully hardware-accelerated.
- SRAM and DRAM: SRAM serves as a buffer for high-throughput writes, while DRAM is utilized for FTL operations.
The Programmable Control Plane
A cornerstone of Fadu’s architecture is the Programmable Control Plane, powered by PPUs (Packet Processing Units): specialized RISC processors optimized for packet processing. This design introduces key advantages:
- Deep Pipeline: NVMe commands are broken down and processed in a multi-stage pipeline, with each stage handled by a dedicated PPU core.
- Increased Throughput: Pipelining allows for concurrent command processing at different stages, significantly boosting throughput.
- Improved Efficiency: By splitting operations into smaller stages, bottlenecks can be resolved efficiently.
Rather than executing all tasks sequentially in firmware, Fadu’s architecture breaks an NVMe command into multiple pipelined stages. Each stage can be optimized independently, ensuring that only the bottlenecks are addressed, rather than overhauling the entire architecture. This yields significant power efficiency and faster adaptation across PCIe generations—from 4.0 to 6.0—without requiring major changes.
Let’s see how an NVMe read command works with the Fadu programmable control plane.
NVMe Read
- ⬇️ Command Fetch
- ⬇️ Command Decoding
- ⬇️ Command Chopping into FTL IO Size
- ⬇️ PRP Parsing
- ⬇️ LBA Dependency Check
- ⬇️ Look up L2P Info
- ⬇️ NAND Req. Merge (in multi-plane)
- ⬇️ Generate NAND Page Read command
If a bottleneck appears in a specific stage when moving to a new PCIe generation, only that stage is modified or pipelined further—providing a more efficient and faster path to market.
Extensive Hardware Offloading of Common SSD Functions
To maximize performance, Fadu has implemented extensive hardware offloading for common SSD functions requiring full RTL implementation:
- FTL / Garbage Collection
- End-to-End Data Protection, LDPC, RAID, and Encryption
Quality of Service Improvements
Another benefit of the Fadu architecture is improved Quality of Service (QoS). Leveraging pipelined architecture reduces tail latency and optimizes performance for mixed workloads. Extensive characterization of various NAND vendors further optimizes latency under different conditions:
- Reduced Tail Latency: Critical for hyperscale and AI workloads.
- Optimized Latency in Mixed Workloads: Crucial for real-world databases, such as RocksDB and Aerospike, that require high random read performance and predictable low latency.
Why Fadu’s Architecture is the Optimal Solution
Fadu’s SSD architecture redefines traditional designs by excelling in four critical areas:
- Scalability: Scaled from PCIe 3.0 to 6.0 with minimal changes to core design, achieving high performance without linearly increasing CPU resources.
- Flexibility: Achieves low latency and power efficiency without sacrificing the ability to support new features and adapt to changing requirements, like Flexible Data Placement
- Performance: Consistently delivers best-in-class performance, scaling to PCIe 6.0 with 28GB/s and 6.9M IOPS for random reads.
- Power Efficiency: Delivers best-in-class power efficiency, as shown in our previous blog post: Energy-Efficient Controller.
Conclusion
The storage industry is at a pivotal moment—old approaches can no longer meet the demands of today’s data-intensive world. It’s time to reinvent SSD architecture, not just to keep pace with technological advancements, but to set the pace.
At Fadu, we’ve embraced this challenge. Our innovative SSD architecture combines scalability and flexibility in ways that redefine the possibilities of SSD performance. By leveraging a programmable control plane with specialized PPUs and extensive hardware offloading, Fadu delivers a solution that is powerful, efficient, and adaptable for the future of storage technology.