AI Doesn’t Have a Storage Problem – It Has Many Storage Problems

AI Doesn't Have a Storage Problem It Has Many Storage Problems

In my first blog post for this series, I talked about the rapid growth of AI, and how Large Language Models (LLMs) and generative AI are driving this growth. I also touched on the crucial role that flash memory plays in enabling high-performance storage solutions needed for AI training and deployment.

Building on that foundation, it’s important to explore the specific storage challenges that arise as AI systems become more advanced and data-intensive. In this post, I’ll examine how different phases of the AI workflow—ranging from data gathering and reformatting to training, tuning, inference, and retrieval-augmented generation—place unique demands on storage. I’ll also discuss critical storage considerations such as bandwidth, latency, capacity, power efficiency, and security, all of which play a role in optimizing AI performance across these phases.

AI’s Storage Struggle

Last year at the 2024 Files and Storage Technologies (FAST ’24) conference, there was a panel discussion titled “Storage Systems in the LLM Era,” with representation from hyperscalers and academia. About 7 minutes into the panel, the moderator asked what does AI need from storage, are there special needs? The first response, echoed by several panelists, was prefaced by the statement that it might be inflammatory. To paraphrase… “AI doesn’t have a storage problem, just keep making it cheaper and bigger and get out of the way.”  In this blog I’m not addressing the engineering efforts necessary to “just keep making storage cheaper and bigger,” however I will offer my rebuttal and opinions on the diverse range of storage problems that AI workloads face.

First and foremost: there is no single representative AI workload for storage. The AI workflow is complex and involves many phases that have unique and differing demands upon storage. I will touch briefly on the following phases: data gathering, reformatting, training, tuning, inference, and retrieval augmented generation. Depending on the scale of your organization, there may be many separate storage systems involved, with some dedicated to specific phases of the AI workflow. In many environments storage resources will be shared across many or all phases further complicating the demands on storage due to the “I/O blender” effect of simultaneously servicing workloads with very different characteristics.

First Things First: Data Gathering and Reformatting

The first phase of the AI workflow involves gathering and reformatting data, which can come from crawling the internet or various internal sources such as sensors, databases, or files. The data volume in this phase can be immense, especially for deep learning models that depend on large datasets. Storage at this stage needs to be scalable and cost-effective, since the raw data may consist of unstructured or semi-structured data that requires reformatting and preprocessing, potentially growing the dataset further.  Preprocessing may involve cleaning the data, handling missing values, transforming data types, and potentially augmenting the dataset; this process may further introduce a need for data provenance. Data provenance requires metadata tagging and tracking to ensure data quality and reproducibility, and data security starts to become more critical as sensitive information may be present within the dataset.  The processed data is often stored in a format optimized for training, such as TFRecord or Parquet, which further emphasizes the need for efficient read/write operations.

Characteristics required of storage devices for data gathering and reformatting

  • High bandwidth
  • Large capacity/space efficiency
  • Low cost per bit
  • Power efficiency
  • Security features
  • Mixed read/write workload

Training, Tuning, Distilling and Checkpointing

The next phases of the AI workflow are training, tuning, and optionally distillation. The reformatted data must be fed to the processing units (typically GPUs or TPUs) to create and modify the AI models. To develop the final AI models, the processing units analyze the data to find patterns and relationships within the data. This may involve feeding data to thousands of GPUs acting as a distributed computing cluster with extreme bandwidth.  Often the training may take days, weeks, or even months to complete in the case of LLMs such as Llama-3, and it is critical to prevent the processing units from waiting on storage devices.

Characteristics required of storage devices for training, tuning, and distilling

  • Low latency
  • High IOPS
  • High bandwidth (particularly in large cluster environments)
  • Large capacity
  • Read dominated

Because the training process may take considerable time, checkpointing is often employed. Checkpointing involves saving the state of the AI model at regular intervals during training, allowing for faster recovery in case of failures. Checkpointing is also useful for tuning purposes; if suboptimal results are observed, the model can be rolled back to a prior checkpoint and modified without completely starting over. In a distributed computing cluster, restoring from checkpoints requires all of the nodes to be synchronized; a single slow device will make all of the cluster members wait.

Characteristics required of storage devices for Checkpointing/Restore

  • Low latency
  • Short tail latency
  • High bandwidth
  • Write dominated

Time for Inference

The inference phase involves using trained AI models to make predictions, create content, or classify new data. Inference may be performed either interactively or in batch mode, with differing demands on storage. An emerging trend in inferencing is the adoption of “mixture of experts” solutions, which switch between different models depending on the query itself. Edge AI applications such as mobile phones have additional storage challenges with respect to packaging, power, and access latency; particularly when local inferencing utilizes flash memory instead of, or as a supplement to, DRAM.

Characteristics required of storage devices for Inference

  • Low latency (Interactive, mixture of experts, multi-tenancy, mobile)
  • Low power (edge and mobile)
  • Packaging (mobile)
  • High bandwidth
  • Read dominated

RAG’s Role

The final phase of AI workloads is Retrieval Augmented Generation, or RAG. RAG enhances AI models by integrating external knowledge sources, allowing models trained on public data to be used with domain specific and/or proprietary enterprise data - enabling more detailed, timely, and accurate responses.

Characteristics required of storage devices for RAG solutions:

  • Low latency
  • High IOPS
  • Security features
  • Read dominated

KIOXIA: Shaping AI Storage Advancements

As AI continues to evolve so too must the storage solutions that support it. Each phase of the AI workflow places unique and demanding requirements on the storage infrastructure. At KIOXIA, we are actively researching and developing advanced storage technologies that meet the growing needs of AI. In my next post I’ll take a closer look at vector databases and their role in RAG. Stay tuned!


All other company names, product names and service names may be trademarks of their respective companies.

Disclaimer
The views and opinions expressed in this blog are those of the author(s) and do not necessarily reflect those of KIOXIA America, Inc.

Comments are closed