DeepSeek AI has made its Fire-Flyer Fire System (3FS) open-source this week as part of the company’s Open Source Week event. The Chinese AI firm claims its 3FS can achieve an aggregate read throughput of 7.3 TB/s in its own server data clusters, where DeepSeek has been using the system since at least 2019.
3FS is a Linux-based parallel file system designed for AI-HPC operations. These systems involve many data storage servers that are constantly accessed by GPU nodes for training large language models (LLMs). What sets 3FS apart is its nearly singular focus on random read speeds, prioritizing them above all else and largely ignoring read caching.
When training AI models, compute units need to access random training data consistently, and reading this data is a one-time process. Because of this, read caches are largely ineffective and are therefore minimized by 3FS. In fact, using a read cache when training LLMs may be detrimental because LLMs are essentially fine-tuned inference machines. As a result, reading the same data repeatedly could potentially link unrelated data within the language model.
The team operating one of DeepSeek’s deep learning clusters, Fire-Flyer 2, published a paper last August that detailed the use of 3FS in the custom-built system. In Fire-Flyer 2, DeepSeek utilized 180 storage nodes, each equipped with 16 16TB SSDs and two 200Gbps NUCs. These nodes served 10,000 PCIe Nvidia A100 GPUs, built on more economical servers compared to Nvidia’s proprietary DGX-A100 products.
Across the entire array, DeepSeek reports that they benchmarked 3FS at 6.6 TB/s while simultaneously running training tasks that added an additional 1.4 TB/s of read throughput. In comparison, the competing file system Ceph reached a read throughput of 1.1 TB/s in early 2024 on a server with 68 nodes, each equipped with 10 16TB SSDs and 2 x 100 Gbps networking.
According to the published paper, 3FS was a critical part of DeepSeek’s software stack for training DeepSeek AI. It was tested on the Fire-Flyer 2 HPC solution, which achieved 80% of the performance of Nvidia’s DGX-A100 server solution at 50% of the price and 60% of the power consumption.
Interested users can download the full Fire-Flyer File System and explore its random-read-forward approach to AI-HPC solutions from DeepSeek’s Github page. Given its performance characteristics, the open-source system is likely to appeal to both enthusiasts and enterprise AI-HPC users, although it may face some resistance related to distrust of Chinese technology before achieving widespread adoption.