Large language models (LLMs) are an important generative AI use case where vast deep learning models are trained on large datasets to perform a broad range of tasks. This can include functionality such as content creation, language translation, customer support assistance, code generation, DNA/protein sequencing analysis, and much more. Applications for LLMs are seemingly endless, where they can tackle general problems and generate meaningful responses to many different types of queries.
Despite being trained on large datasets, there is still a vast amount of information on certain topics or proprietary data that LLMs may not have had access to in its initial training. These LLMs perform poorly or even return responses that are not true when queried on certain topics. Businesses are wary when interacting with these public LLMs due to the risk of proprietary data being leaked and made useable in subsequent responses to other users. As a result, it can be a challenge to deploy both accurate and high performing LLMs to address specific business problems.
RAG to the Rescue
Retrieval Augmented Generation (RAG) was introduced to help mitigate factually incorrect responses or hallucinations from LLMs. This AI framework retrieves factual data from a separate external data source, such as a vector database (DB) that enables pre-trained LLMs to create more accurate and relevant responses from queries, which in turn better address specific business decisions. This accuracy allows for businesses to upload their proprietary data into internally hosted vector DBs to help the pre-trained LLMs return more relevant responses from specific business data. Vector DBs complement and enhance the performance of AI models by converting data into vector embeddings, which allows for more efficient storage and retrieval. Vector indexing methods, such as Hierarchical Navigable Small Worlds (HNSW), help organize these vectors for speedy searches and retrieval.
One issue that arises when storing these large datasets into vector DBs is that the vector embeddings, as well as popular in-memory vector indexes (such as HNSW), can require a sizeable amount of DRAM, which poses significant system design and cost challenges. Since large datasets are stored on lots of system memory, this approach can be costly given today’s DRAM prices. Individual server nodes also have upward limitations in regards to maximum memory capacity, requiring scale out solutions to deploy more system memory, which incurs even more cost.
Disk-based indexes using Disk Approximate Nearest Neighbor (DiskANN1) algorithms (referred as DiskANN indexes) provide another vector index option that can offload the system memory footprint to disk. DiskANN algorithms require high-performing SSDs at the backend to maintain high database throughputs when traversing through very complex vector spaces to return similarity search results.
How Does an SSD Compare to DRAM?
The Application Performance Lab at KIOXIA set out to see if one of our fast, large capacity PCIe 5.0 SSDs could be a cost-effective storage solution in place of system memory for storing large datasets and delivering high throughput.
Our lab tested disk-based DiskANN indexes and a KIOXIA CM7 Series PCIe 5.0 SSD against purely memory-based HNSW indexes, and included queries per second (QPS), system memory footprint, and recall. We used the VectorDBBench benchmarking tool and various sized datasets ranging from 1 to 100 million vectors to measure the database throughput that the vector DB achieved when querying the vector space for a similarity search. We observed that at lower total vector counts, the DiskANN indexes outperformed the HNSW indexes, and for larger dataset sizes, database throughput was similar.
We also measured the amount of system memory in gigabytes2 (GB) that was used when running both disk-based DiskANN indexes and vector-based HNSW indexes, and noted that the HNSW indexes consistently used much larger memory footprints (see below).
Using disk instead of DRAM can lower and offload the system memory footprint requirement to run RAG-based applications. For this comparison, DiskANN indexes had 32% lower total system memory usage at 1 million vectors, 64% lower at 10 million vectors and 74% lower at 100 million vectors. Disk-based indexes also enables system design and associated purchases to be altered so that less DRAM is required to run large vector DB datasets by storing the indexes on fast KIOXIA CM7 Series PCIe 5.0 SSDs.
Check out the complete results compiled from our lab testing in the full performance brief available here.
NOTES:
1 The DiskANN repository requests the following citation:
@misc{diskann-github;
authors = Simhadri, Harsha Vardhan and Krishnaswamy, Ravishankar and Srinivasa, Gopal and Subramanya, Suhas Jayaram and Antonijevic, Andrija and Pryce, Dax and Kaczynski, David and Williams, Shane and Gollapudi, Siddarth and Sivashankar, Varun and Karia, Neel and Singh, Aditi and Jaiswal, Shikhar and Mahapatro, Neelam and Adams, Philip and Tower, Bryan and Patel, Yash;
title = DiskANN: Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search;
urls = https://github.com/Microsoft/DiskANN};
version = 0.6.1;
year = 2023;
2 Definition of capacity: KIOXIA Corporation defines a megabyte (MB) as 1,000,000 bytes, a gigabyte (GB) as 1,000,000,000 bytes, a terabyte (TB) as 1,000,000,000,000 bytes and a petabyte (PB) as 1,000,000,000,000,000 bytes. A computer operating system, however, reports storage capacity using powers of 2 for the definition of 1Gbit = 230 bits = 1,073,741,824 bits, 1GB = 230 bytes = 1,073,741,824 bytes, 1TB = 240 bytes = 1,099,511,627,776 bytes and 1PB = 240 bytes = 1,125,899,906,842,624 bytes and therefore shows less storage capacity. Available storage capacity (including examples of various media files) will vary based on file size, formatting, settings, software and operating system, and/or pre-installed software applications, or media content. Actual formatted capacity may vary.
TRADEMARKS:
PCIe is a registered trademark of PCI-SIG. All other company names, product names and service names may be trademarks or registered trademarks of third-party companies.
DISCLAIMERS:
KIOXIA America, Inc. may make changes to specifications and product descriptions at any time. The information presented in this blog is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Any performance tests and ratings are measured using systems that reflect the approximate performance of KIOXIA America, Inc. products as measured by those tests. In no event will KIOXIA America, Inc. be liable to any person for any direct, indirect, special or other consequential damages arising from the use of any information contained herein, even if KIOXIA America, Inc. are advised of the possibility of such damages.
All other company names, product names and service names may be trademarks of their respective companies.