FLC is an architecture that redefines memory on modern devices. It offloads traditional memory usage to less expensive flash memory and solid-state drives while using only a small amount of expensive DRAM as cache. It dramatically reduces the size, cost, and power requirements of anything from personal devices to generative AI capable servers.
High-bandwidth, moderate-capacity DRAM inserted as final-level cache (FLC) for enhancing the performance of standard DDR memory. Additionally, the DDR memory can be used as a massive workload cache to hide the latency of storage (e.g., SSD) when used as a final memory.
Optimal combination of bandwidth, latency, capacity, and power dissipation
Economic & energy-efficient way to build petabyte scale accessible DRAM/SSD pool
INNOVATIVE & DISRUPTIVE
Very High (>95%) Hit Rate for FLC-1 High Bandwidth Cache; FLC-2 ~100% hit rate
Fully-associative look-up engine with gigantic entries (e.g., 32K/64K for 128MB cache)
Large cache line (e.g., 2KB, 4KB, 16KB, or larger)
Multi-level (2 or more) caching
Effective in inspecting & managing (= masking or mapping out) defective or failing memory addresses
Cache DRAM or HBM3 for FLC level 1
Final-Level Cache (FLC) Fundamental High Memory Bandwidth Technology
Memory Latency When Fully Active (Without FLC)
Low latency speed (=Published Spec., e.g. ~60ns) when idle
Big latency (e.g. >200ns) when fully active
Why High-Bandwidth FLC Wins
Sufficient bandwidth available in FLC1 for full memory access requests
Economic & energy-efficient way to build gigantic total (~peta bytes) accessible DRAM/SSD pool
What Happens When FLC 1 Misses?
Low latency from almost idle DDR for FLC 2
Much lower than conventional implementation without FLC1
Low FLC2 activity (Few % of Time when FLC1 Misses)
Typical DDR or CXL memory has very high latency due to the inherent overhead of CXL.
The high bandwidth Cache DRAM hit rate of >95% results in significantly reduced latency. This is shown in the second graph. Even when missing FLC-1 High Bandwidth Cache, latency remains low, due to low utilization of the DDR as shown in the third graph.