Interested in Enhancing Predictive LLMs through SSD Scaling? Introducing KIOXIA AiSAQ™ Search Engine Technology that Offloads LLM Data to SSDs

Interested in Enhancing Predictive LLMs through SSSD Scaling

Large language models (LLMs) are revolutionizing the way we work by letting us harness the power of natural language.  These AI systems are trained on massive datasets that include everything from everyday conversations to coding languages.  When it comes time for an LLM to respond to a question or prompt, it relies on this training data as its foundation.

But what happens when the LLM's training data doesn't have the information it needs to give a proper answer?  That's where Retrieval Augmented Generation (RAG) comes in.  This technique lets LLMs draw on extra information that wasn’t available during training, effectively filling in knowledge gaps.  With RAG, language models can handle topics like current events, sensitive or proprietary info, and other areas where initial training data might be lacking.

RAG-enabled language model pipelines often rely on a database optimized for the vectorized representation of natural language information to store the extra context that can help the LLM generate better responses.  These vector databases can grow large – and that's where searchability becomes an issue.

To tackle performance issues that can occur when searching large vector datasets, developers typically use Approximate Nearest Neighbors Search indexing techniques such as Hierarchical Navigable Small Worlds (HNSW).  These indexes work like fast lookup tables that allow the system to quickly find relevant information in the vector database.  Oftentimes, these indexes would be stored in DRAM for fast performance.  However, modern real-world indexes can be 10x larger than the dataset requiring on the order of a terabyte1 of DRAM to store them, which is a major scalability challenge.

Enter KIOXIA's innovative solution to the scalability problem called AiSAQ. This cutting-edge Approximate Nearest Neighbors search technology is designed to tackle the challenges of working with massive amounts of data by offloading those heavyweight indexes to run on SSD storage instead of DRAM.  This means that scalability issues with large datasets can now be solved with fast, off-the-shelf SSDs, instead of relying solely on DRAM!

By pairing AiSAQ with fast PCIe® 5.0 SSDs, KIOXIA has created a scalability pathway for LLM-RAG systems that doesn't require the usual trade-offs between in-memory versus on-disk solution structures.  We wanted to see this technology in action, so our lab set up a comparison test to put AiSAQ through its paces.

We compared2 AiSAQ’s DRAM usage against an HNSW indexing system3. The results were striking - HNSW used a whopping 60.7 gibibytes4 (GiB) of DRAM, while AiSAQ managed to keep its DRAM usage down to a mere 155 mebibytes4 (MiB). And the best part AiSAQ didn’t have to sacrifice speed (queries per second) or accuracy (recall) for that reduction in DRAM usage, as seen in the figure below:

AISAQ's DRAM Usage

The numbers are quite astonishing – amounting to a staggering ~396x reduction in DRAM utilization.  This is huge news for anyone working with LLMs as it means you can store even bigger supplemental knowledge datasets and still get the speed and accuracy you need.

The introductory AiSAQ tech brief, including the DRAM utilization benchmark and test results, are available here.


FOOTNOTES:

1 Definition of capacity: Kioxia Corporation defines a megabyte (MB) as 1,000,000 bytes, a gigabyte (GB) as 1,000,000,000 bytes, a terabyte (TB) as 1,000,000,000,000 bytes and a petabyte (PB) as 1,000,000,000,000,000 bytes. A computer operating system, however, reports storage capacity using powers of 2 for the definition of 1Gbit = 230 bits = 1,073,741,824 bits, 1GB = 230 bytes = 1,073,741,824 bytes, 1TB = 240 bytes = 1,099,511,627,776 bytes and 1PB = 240 bytes = 1,125,899,906,842,624 bytes and therefore shows less storage capacity. Available storage capacity (including examples of various media files) will vary based on file size, formatting, settings, software and operating system, and/or pre-installed software applications, or media content. Actual formatted capacity may vary.

2 Based on testing performed by KIOXIA America, Inc., and completed on October 1, 2024.

3 The test configuration included one Supermicro® AS -2125HS-TNR PCIe 5.0 server deployed with two 15.36 TB capacity U.2. KIOXIA CD8P Series Data Center SSDs run in a 100 G network.  Test software, derived from https://github.com/erikbern/ann-benchmarks, was used with custom extensions written for AiSAQ search engine technology developed by Kioxia Corporation.

4 A mebibyte (MiB) means 220, or 1,048,576 bytes.  A gibibyte (GiB) means 230, or 1,073,741,824 bytes.

TRADEMARKS:

AiSAQ is a trademark of KIOXIA Corporation.  PCIe is a registered trademark of PCI-SIG.  Supermicro is a registered trademark of Super Micro Computer, Inc. or its subsidiaries in the United States and other countries.  All other company names, product names and service names may be trademarks of third-party companies.

DISCLAIMERS:

KIOXIA America, Inc. may make changes to specifications and product descriptions at any time.  The information presented in this blog is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.  Any performance tests and ratings are measured using systems that reflect the approximate performance of KIOXIA America, Inc. products as measured by those tests. In no event will KIOXIA America, Inc. be liable to any person for any direct, indirect, special or other consequential damages arising from the use of any information contained herein, even if KIOXIA America, Inc. are advised of the possibility of such damages.

Disclaimer
The views and opinions expressed in this blog are those of the author(s) and do not necessarily reflect those of KIOXIA America, Inc.

Comments are closed