SLC Cache – How important it is, and how does it affect the SSD Endurance and Performance!

Hello everyone, how are you doing? In today’s article, we will conduct a test to determine the true impact of SLC Cache on everyday SSD usage.

Before we delve into the test results, let’s address a few crucial points right from the start. Firstly, for this article, we started with a 240GB Crucial BX500 SSD. This particular SSD is equipped with a Silicon Motion SM2259XT controller which was paired with 4 Micron TLC dies, specifically the B47R models.

Controller
The SSD controller is responsible for handling data management, over-provisioning, garbage collection, among other background functions. And, of course, this ensures that the SSD performs well.

The SSD in question is equipped with a Silicon Motion controller, specifically the SM2259XT. This controller is based on the ARM 32-bit ISA model Cortex® R5 (Single-core) and is manufactured using TSMC’s 28nm process. The main core of the controller operates at a clock speed of 437 MHz. The SM2259XT controller is commonly found in various SSDs targeting the entry-level and cost-effective market segment, including models like Adata SU630 and Hikvision E100N.

The SM2259XT controller features four communication channels, with support for a bus speed of up to ONFI 4.0 or Toggle 2.0/3.0/4.0. In other words, it can handle a maximum bus speed of 400 MT/s (200MHz). Each communication channel can be interleaved with up to four Chip Enable connections. These connections serve as commands and physically connected pathways between each Die (memory chip) and the controller. Consequently, the controller can communicate with a maximum of four dies per channel, allowing for a total of up to 16 dies to achieve optimal performance.

In order to maintain consistent high performance, every high-end SSD requires a buffer to store mapping tables, such as the Flash Translation Layer (FTL) or Look-up table. This buffer enables improved random performance and responsiveness.

It’s worth noting that this particular SSD is DRAM-Less, as indicated by the “XT” designation in the controller model. This means that it does not support DRAM Cache. Furthermore, since it utilizes a SATA interface, it does not support H.M.B. (Host Memory Buffering).

NAND Flash
The 240GB SSD in question is equipped with two Nand flash chips, specifically marked as “NY111” or “MT29F1T08EELEEJ4-M:E”. These Nand chips are manufactured by Micron, an American semiconductor company. They belong to the B47R model series. Each chip has a capacity of 512Gb (64GiB) and contains 176 layers of data. Furthermore, they have a total of 195 gates, resulting in an array efficiency of 90.2%.

In this particular SSD, each NAND Flash chip consists of two dies, each with a density of 512Gb, resulting in a total of 128GB per NAND chip. With two NAND chips present, the total storage capacity of the SSD amounts to 256GB. The communication between the NAND chips and the controller occurs at a bus speed of 400 MT/s (200 MHz).

Each of these dies has 4 planes so that when the controller accesses each die, it can increase parallelism and thus improve performance.

HOW THE TEST WAS CONFIGURED

Firstly, once we have identified the controller and NAND Flash used in the SSD, we will utilize MPTools for this particular controller. MPTools, also known as Mass Production Tools, are tools provided directly by the controller manufacturer. These tools enable us to make various logical modifications to the SSD.

Before proceeding with the software modifications, it is necessary to disassemble the SSD and create a short circuit between two terminals on its printed circuit board (PCB). This action puts the SSD into “ROM Mode,” enabling the writing and modification of its firmware.

To achieve this, an SSD Card Opener device was utilized. The SSD Card Opener is a 2.5″ SATA to USB 3.1 Gen 1 adapter with a transfer rate of 5Gbps. It features a bridge chip manufactured by JMicron, specifically the JMS578 model.

The SSD Card Opener can be purchased on AliExpress for approximately $20. With the SSD disassembled and ready, we can now connect it to the USB adapter and then connect it to the computer.

For this test, we will use the same battery of tests as in other reviews. However, this time we will specifically compare the performance of the SSD with SLC Cache against the SSD without SLC Cache. By focusing on this specific comparison, we aim to evaluate the impact of SLC Cache on the SSD’s overall performance.

TEST BENCH
– O.S.: Windows 10 Pro 64-bit (Build: 22H2) + Windows 11 Pro 64-bit (Build: 22H2)
– CPU: AMD Ryzen 9 5950X (16C/32T) (Fixed frequency, 4 GHz)
– RAM: 2 × 16 GB DDR4-3200MHz CL-16 Netac (c/ XMP)
– Motherboard: Gigabyte X570s Aorus Elite AX (Bios Ver.: F5c)
– GPU: RTX 3050 Gigabyte Gaming OC (Drivers: 531.xx)
– Storage (OS): SSD SK Hynix Platinum P41 2TB (Firmware: 51060A20)
– D.U.T.: SSD Crucial BX500 240GB (Firmware: M6CR056)
– Driver Chipset AMD X570: 4.03.03.431.
– Windows: Indexing Disabled
– Windows: Windows update Disabled
– Windows: no Anti Virus
– Boot Windows: Clean OS install
– Test pSLC Cache: drive active cooled by fans to prevent thermal throttling

CRYSTALDISKMARK
We conducted synthetic sequential and random tests with the following configurations:

Sequential: 2x 1 GiB (Blocks 1 MiB) 8 Queues 1 Thread

Random: 2x 1 GiB (Blocks 4 KiB) 1 Queue 1/2/4/8/16 Threads

The sequential read performance remained at the same level even without the SLC Cache, only the write performance dropped drastically. This is because the dies have sufficient data bus and throughput to saturate the 6Gbps SATA interface in reading, but they are not capable of saturating it in writing. If it were a 480GB or 960GB SSD with 8 or 16 dies, it would be able to comfortably saturate this bus.

In terms of latency, we can see that the dies have low latencies in reading, while the true latencies of the dies in TLC mode during writing are about 10 times higher than in pseudo-SLC mode.

The same can be said for its random speeds. In reading with a queue depth of 4, it still manages to achieve a similar score, as in the overwhelming majority of cases, SLC Cache is designed to improve the sequential and random writing performance of the SSD. However, as we can see, the write performance without the SLC Cache was dismal. Even though it is a modern TLC die, its random performance is terrible due to its high latencies.

When allocating only 1 thread to better represent a typical everyday workload, the same result occurs, but the write performance with the cache disabled was 10 times worse.

ATTO Disk Benchmark QD1 & QD4

We conducted a test using ATTO to measure the speed of the SSDs with different block sizes. In this benchmark, it was configured as follows:

Blocks: From 512 Bytes to 8 MiB

File Size: 256MB

Queue Depth: 1 e 4.

ATTO Disk Benchmark is a software that performs a sequential speed test with compressed files, simulating a data transfer load similar to what we commonly see in Windows. Typically, we observe performance in the range of block sizes from 128KB to 1 MiB. In smaller block sizes, the SSD performs similarly when the cache is active in reading. This is due to the dies being capable of saturating the bus.

In the write test, we can see that the SSD once again becomes extremely slow, especially with very small block sizes that represent the traditional file sizes for operating system storage. Only when the block sizes start to increase significantly, surpassing 64KB, does the SSD begin to regain some of its write speed.

When using a queue depth of 1 (QD1), we observe that the same results are repeated.

3DMark – Storage Benchmark

In this benchmark, various storage-related tests are performed, including game loading tests for games like Call of Duty Black Ops 4 and Overwatch, recording and streaming gameplay at 1080p 60 FPS using OBS, game installations, and file transfers involving game folders.

In this benchmark with more realistic and traditional traces of everyday use, we can observe a significant drop in performance. The bandwidth decreased by almost half, and the latency nearly doubled, resulting in a performance drop of approximately 45%.

PCMARK 10 – FULL SYSTEM DRIVE BENCHMARK
In this test, the Storage Test tool was used, specifically the “Full System Drive Benchmark” test, which performs both light and heavy tests on the SSD.


Among these traces, we can observe tests such as:

  • Boot Windows 10
  • Adobe After Effects: Launching the application until it is ready for use
  • Adobe Illustrator: Launching the application until it is ready for use
  • Adobe Premiere Pro: Launching the application until it is ready for use
  • Adobe Lightroom: Launching the application until it is ready for use
  • Adobe Photoshop: Launching the application until it is ready for use
  • Battlefield V: Loading time until the start menu
  • Call of Duty Black Ops 4: Loading time until the start menu
  • Overwatch: Loading time until the start menu
  • Using Adobe After Effects
  • Using Microsoft Excel
  • Using Adobe Illustrator
  • Using Adobe InDesign
  • Using Microsoft PowerPoint
  • Using Adobe Photoshop (Intensive use)
  • Using Adobe Photoshop (Light use)
  • Copying 4 ISO files, totaling 20GB, from a secondary disk (Write test)
  • Performing read-write copy of the ISO file (Read-write test)
  • Copying the ISO file to a secondary disk (Read)
  • Copying 339 JPEG files (Photos) to the tested disk (Write)
  • Creating copies of these JPEG files (Read-write)
  • Copying 339 JPEG files (Photos) to another disk (Read)

In this other benchmark, which is slightly older and focuses more on productivity, there was another significant drop in performance, although not as large as what we saw in 3DMark. However, in this case, the decrease occurred because the bandwidth dropped by approximately 22%, while the latency increased by about 30%.

TEST PROJECT – Adobe Premiere Pro 2021
Next, we used Adobe Premiere to measure the average time it took to open a project of approximately 16.5GB with a 4K resolution, 120Mbps bitrate, and filled with effects until it was ready for editing. It is worth noting that the tested SSD was always used as a secondary drive without the operating system installed, as this could affect the results and introduce inconsistencies.

When using Premiere to load a project of over 16GB, in this benchmark, there was no difference, as it is loading the project, meaning it is performing a read operation.

WINDOWS BOOT TIME AND GAME LOADING TIME
We conducted a comparison between multiple SSDs and an HDD, using a clean installation of Windows 10 Build 21H1 along with the benchmark of Final Fantasy XIV in campaign mode. The test consisted of recording the best result after three consecutive system boots, considering the total time until reaching the desktop area with the score provided by the application. Therefore, it is slower than the boot time until the desktop screen is displayed.

The same happened during the game loading, where the difference between the SSD with SLC Cache and without SLC Cache was almost imperceptible.

In this program, it includes the time from boot until the loading of the latest OS drivers, which in this case, is a clean installation with only operating system drivers such as Network, Wireless + Bluetooth, Audio, Nvidia drivers, PCH, among others. Therefore, in this scenario, we can see that there was not a significant difference.

SUSTAINED WRITE SPEED | SLC CACHING
Many SSDs on the market today utilize SLC Caching technology, where a certain percentage of their storage capacity, whether it is MLC (2 bits per cell), TLC (3 bits per cell), or QLC (4 bits per cell), is used to store only 1 bit per cell. In this case, it is used as a write and read buffer, where the controller initiates the writing process, and when the buffer is full, it writes to the native NAND Flash (MLC / TLC / QLC).

Through IOmeter, we can get an idea of the SLC cache size of this SSD, as manufacturers often do not disclose this information. Based on the tests we conducted, it was found that it has a dynamic pSLC cache volume of about 24GB. It was able to maintain an average speed of ~ 461MB/s until the end of the buffer, which is a good speed considering this is a SATA SSD.

After writing 24GB, it started to write natively to its TLC blocks at an average speed of approximately 111MB/s, covering a range from 25GB to 192GB, totaling over 160GB.

Then the folding process began, where the blocks that were in pSLC mode were reprogrammed back to TLC, resulting in a significant performance drop. The SSD maintained an average write speed of 81MB/s until it was filled.

The overall average, including the folding process and native writing, was 103MB/s.

When the SLC cache is disabled, we observe that its sustained write speed starts in the range of 240MB/s and continues up to nearly 90GB. After that point, the speed drops to an average of 100MB/s, which represents the sustained speed of this SSD.

FILE TRANSFER TEST

In this test, we copied the ISO files and CSGO game from a RAM Disk to the SSD to assess its performance. We used the Windows 10 21H1 ISO file, which is 6.25GB in size (1 file), and its extracted version using Winrar, which consisted of a folder containing 1,874 smaller files. Additionally, we copied the CSGO installation folder, which is 25.2GB in size.

When using the Windows 10 .ISO image, we observed that without the SLC Cache, the SSD had significantly lower performance.

The same results happened again with this different files.

TEMPERATURE

In this section of the analysis, we will test the temperature of the SSD during a stress test where continuous files are transferred to the SSD. This will help us determine if there is any thermal throttling occurring with its internal components that could potentially result in performance bottlenecks or loss.

Since these are SATA SSDs, they don’t heat up much to begin with, and in this test, we can see that there was a temperature difference with or without the SLC Cache.

POWER DRAW AND EFFICIENCY

Since these are SATA SSDs, they don’t generate much heat to begin with. In this test, we observed that there was no significant temperature difference with or without the SLC Cache enabled.

Special Thanks for Quarch Solutions for sending over this Unit!

In this section of the analysis, we will be using the Quarch Programmable Power Module provided by Quarch Solutions to conduct tests and evaluate the efficiency of the SSD. We will perform three tests: maximum power consumption of the SSD, average power consumption in practical and casual scenarios, and power consumption at idle.

This set of tests, especially the efficiency and idle power consumption tests, is important for users who intend to use SSDs in laptops. SSDs spend the majority of their time in low-power states (idle), so this information is valuable in terms of battery saving.

In terms of efficiency, we can see that it decreases significantly without the SLC Cache. The time it took to transfer the file folder, along with the average speed, was lower when the SLC Cache was disabled, resulting in lower efficiency.

Regarding their maximum power consumption, in both scenarios, the SSDs had identical power consumption.

The same occurred in their average power consumption, where the difference was almost negligible.

Lastly, and most importantly, in the Idle test, which represents the scenario where the majority of SSDs are in everyday use, we can see that without the SLC Cache, the SSD had significantly lower power consumption.

Conclusion

From this brief article, we can conclude that SLC Cache does indeed contribute significantly to the performance of an SSD, allowing it to offer satisfactory performance. However, like everything, there are disadvantages.

One of the main drawbacks of not using SLC Cache is that it becomes impossible to achieve the level of durability that the manufacturer claims. In some cases, manufacturers use NAND flash chips that have not undergone all quality control tests and may have a higher number of bad blocks or lower durability than normal. To mitigate this, the SSD may use a portion of the NAND as a static SLC Cache, so as to avoid a significant impact on the overall durability of the drive.

As stated in the datasheet of this Micron NAND Die, the B16A FortisFlash, which is a 64-layer 256Gb TLC Die, we can see that in its native TLC mode, it can handle an estimated 1,500 Program Erase Cycles, which represents the number of times each cell within each page can be rewritten after being erased. However, when this same die operates in pseudo-SLC mode, its durability increases dramatically, reaching over 40,000 Program Erase Cycles.

Another point that is easier to understand is that in the vast majority of cases, consumer-grade SSDs cannot achieve the sequential and random speeds claimed by the manufacturer. In SATA SSDs, it is possible to saturate the sequential bus in certain scenarios, as these dies perform better in such situations. However, this only occurs with SSDs of a certain capacity that have enough dies to saturate the bus.

There are indeed positive aspects, such as the fact that with the cache deactivated, Write Amplification decreases. I will soon be bringing a series of articles where I will explain in much more detail the architecture of SSDs.

In summary, Write Amplification is a phenomenon that occurs in SSDs and is related to how data is written and managed in the device. When an SSD receives a write request, it needs to perform internal operations such as Wear Leveling and Garbage Collection to ensure the optimal utilization of the NAND flash memory. However, these additional operations can result in write amplification, where more data is physically written to the memory than necessary. This can reduce the speed and lifespan of the SSD.

One of the causes of write amplification is data fragmentation. As data is written and erased on the SSD, empty spaces are created, leading to fragmentation of data blocks. When new data is written, the SSD needs to perform a series of operations to find and allocate contiguous free blocks, resulting in more physical writes than necessary. This fragmentation can occur especially with random writes or with intensive SSD usage.

Furthermore, the management of free space on the SSD also contributes to write amplification. Since the SSD needs to maintain a minimum amount of free space for write operations and proper device functioning, the actual capacity available for data storage may be lower than the advertised total capacity. As the SSD fills up, the available free space decreases, which can lead to higher write amplification as the device needs to perform more internal operations to optimize the available space.

But that’s it, folks! I hope you enjoyed this brief article. In the future, I will try to bring a continuation, focusing on NVMe SSDs, which tend to experience even greater performance degradation. However, I must emphasize that you should not intentionally engage in such practices as it will significantly harm your SSDs! 😀

Deixe uma resposta