Q:   WHAT IS SCM AND WHY IS IT IMPORTANT?

SCM (Storage Class Memory) is one category of emerging memories that has storage characteristics and capacity.  It sort of has the best of both worlds – speed and interface like DRAM, while capacity and non-volatility like NAND.  SCM was a dream device for decades, but it is not until Intel/Micron announced 3D XpointTM (3DXP) in 2015 that people started to pay serious attention to it.  In their 2015 announcement, Intel/Micron touted 3DXP to be “1000 times faster than NAND and 10 times denser than DRAM”.

 

Q:   WE KNOW DRAM DOES NOT NEED A CONTROLLER AT THE DEVICE SIDE, WHILE NAND DOES.  HOW ABOUT SCM?

The important question is “write endurance” of the device, meaning how many times a particular location can be written without damaging the data stored.  DRAM can sustained 10^15, or near infinite; while NAND can only sustain around 10^3.  SCM is better than NAND in write endurance, being able to sustain 10^5-10^8 depending on technology.  However, given the SCM write endurance, a controller is still needed to perform address translation.  In other words, the SCM device shall map the host address into another physical address, a process often referred to as “wear leveling”.

 

Q:   HOW IS SCM CONTROLLER DIFFERENT FROM A NAND CONTROLLER, IF BOTH NEED TO DO WEAR LEVELING?

Traditional NAND controllers employ FTL (flash translation layer), which uses an address translation table (in the following, “table” in short) for the mapping between host address and physical address.  The table access is an extra step.  For read, a table access is needed first in order to get the physical address to read the data.  For write, a table update is needed at the end of transaction in order to record the new physical address mapping.  One can store the table in DRAM (faster, expensive) or NAND (slower, cheaper), or a combination of both.  One other cost factor for storing the table in DRAM is that during power outage, the DRAM content needs to be saved in NAND, often requiring a Super-cap to hold the power temporarily.  The Super-cap could be expensive and sometimes bulky if the DRAM table is sizable.  Because of these challenges, a couple years ago Wolley started exploring a table-less SCM controller architecture.  We believe a table-based FTL is a good choice for NAND controller, but a table-less controller architecture is better for SCM.

 

Q:   WHAT ARE THE ADVANTAGES OF A TABLE-LESS CONTROLLER ARCHITECTURE?

Well, cost and performance.  Eliminating the table gets rid of the DRAM and Super-cap, if the table is to be stored externally.  It also eliminates the extra SCM access step, if the table to be stored inside the SCM, and hence the performance is better.

 

Q:   CAN A TABLE-LESS CONTROLLER ARCHITECTURE BE APPLIED TO NAND?

Unfortunately no.  This goes back to the fundamental device characteristics.  For NAND, it cannot write “in-place”, as it will mean reading the remaining 12KB, erasing the page, then writing a total of 16KB.  The NAND host write will always be written to a new location, while marking the 4KB inside the original page “invalid”.  There is also a background garbage collection process to clean up the pages by collecting and compacting the remaining valid data in the pages.  For the table-less controller, we rely on doing in-place writes.  SCM just cannot do in-place writes forever, due to write endurance limitation.

 

Q:   IS WOLLEY THE FIRST TO PROPOSE TABLE-LESS CONTROLLER FOR SCM?

Actually no.  In 2009, IBM Research published a paper about a “start-gap” wear leveling scheme for PCM (phase change memory, which is one form of SCM).  The start-gap architecture is also table-less.  But we believe it has many implementation issues to be practical. 

 

Q:   HAS WOLLEY IMPLEMENTED YOUR CONTROLLER?

We have implemented our table-less SCM controller in FPGA and obtained very good results.  In our FPGA platform, we are using DRAM to emulate SCM.  Inside our FPGA, we could add delay in order to compensate for the SCM latency.  Our FPGA has an PCIe Gen3x8 host interface.  In a memory (64B) mode, we achieve 16M random read IOPs and 10M random write IOPs.  In a storage (4KB) NVMe mode, we achieve 800K random read IOPs and 700K random write IOPs.  One of the key advantages of a table-less architecture is that we have a very low latency.  For 4KB, our projected (with an ASIC controller) read latency is less than 6us, and a write latency is about 7us.  We also validated our sudden power outage recovery (SPOR) capability, so our hardware and firmware are very robust.

 

Q:   HAS WOLLEY BEEN TALKING TO SCM MEMORY VENDORS ABOUT YOUR PROPOSAL?  HOW WAS YOUR STORY RECEIVED?

We were talking to SCM memory vendors on and off in the past.  But starting in 2019 we were engaging with them more actively.  There were two reasons for the timing.  One is that we received our US patent grant earlier in 2019.  Also, our FPGA implementation and performance results were more mature now.  So far the dialog and engagement with SCM memory vendors are very encouraging.