Skip to main content

[@ServeTheHomeVideo] Tiered Storage for AI Workloads - Open Storage Summit 2025 Session 1

· 6 min read

@ServeTheHomeVideo - "Tiered Storage for AI Workloads - Open Storage Summit 2025 Session 1"

Link: https://youtu.be/5r7C5BIouUI

Short Summary

Number One Action Item/Takeaway: Enterprises need to strategically architect tiered storage solutions, considering performance, capacity, latency, and cost-effectiveness, to efficiently manage the exploding data volumes and evolving demands of AI workloads, including training, inference, and retrieval-augmented generation (RAG).

Executive Summary: AI's growing data demands and shift towards inference necessitate a tiered storage approach that balances performance and capacity. Organizations need to architect their storage solutions to encompass fast flash storage for low-latency inference and large-capacity options for data lakes and archiving, creating an ecosystem where they are ready for their move to the AI of the future. Collaboration between vendors is key to delivering optimized and scalable AI-ready storage infrastructure.

Key Quotes

Okay, here are 5 direct quotes extracted from the transcript, chosen for their insights, data points, or strong opinions about AI and storage:

  1. Rob Stretch: "In the age of AI, storage is no longer a back-end function. It's a strategic driver of innovation, performance, and competitive advantage."
  2. John Kim: "Rag allows you to import and embed and index additional documents, enterprise content, web content, legal filings, financial filings, stock market data, PDFs. You can take all this data you uh you have this embedding workflow ... and it unlocks the value or the information in these enterprise documents and web content and it can be content from a few minutes ago."
  3. Kevin Tubs: "As AI evolves fast, how do you future proof? How do you make technology stack decisions that allow you to handle the workloads today but also be built to leverage the technologies intelligently as we move forward."
  4. Anders Graham: "With a two TB die, you can have a 16 die stack. That's four terabytes in a single package. And with 32 packages on a PCB, that's how you get to your 128 TB drives. And so this is now a very easy way for Kyokia and others to to get to that capacity level."
  5. Giorgio Regini: "If you look at the developer point of view, right? So developers are smart. They're also very lazy...And lazy means they want to automate everything."

Detailed Summary

Here's a detailed summary of the YouTube video transcript in bullet points:

Key Topics:

  • AI and Storage Revolution: AI's rapid growth and complexity are fundamentally reshaping enterprise storage infrastructure.
  • Tiered Storage: The discussion focuses on the need for tiered storage solutions to meet the diverse demands of AI workloads (training and inference).
  • Performance vs. Capacity: Balancing the need for high performance (low latency, high throughput) with massive capacity is a central challenge.
  • Inference is Dominating: AI inference is rapidly becoming the dominant workload, shifting the focus to optimizing for speed, reliability, and real-time responsiveness.
  • Sustainability: Energy efficiency and environmental impact are increasingly important considerations in storage design and deployment.
  • Ecosystem Collaboration: The necessity of collaboration between technology vendors (NVIDIA, WekaIO, Scality, Kioxia, Supermicro) to deliver comprehensive AI storage solutions.

Arguments & Information:

  • AI Workload Demands:
    • AI models require not just capacity, but also performance, efficiency, flexibility, and orchestration across hybrid environments.
    • Enterprises are now planning for exabytes of storage across on-premises, edge, and cloud environments.
    • Generative AI is driving an explosion of data (text, images, code, video) demanding real-time access.
  • Training vs. Inference Storage Needs:
    • Training: Involves a mix of hard drives (for data lakes) and flash storage (for fast read performance during training). Checkpoints (saving progress) require very fast write/read performance (typically all-flash).
    • Inference: Traditionally thought to require no storage (models loaded into memory). But now, Retrieval Augmented Generation (RAG) and KV cache (key-value cache) are driving the need for storage.
  • Retrieval Augmented Generation (RAG):
    • RAG enhances large language models (LLMs) by incorporating real-time, relevant documents into queries, improving accuracy and relevance.
    • Requires embedding workflows, vector databases, and fast storage (file and object) for indexing and retrieval.
  • KV Cache:
    • Stores query context and related documents to speed up processing of similar or repeated queries.
    • Requires fast flash storage to store and retrieve KV cache data quickly.
  • Storage Tiers and Technologies:
    • Memory (HBM on GPU, CPU memory)
    • Local flash (NVMe fabrics)
    • Networked flash
    • Hard drives/Object Storage (for data lakes, query history, logs)
    • Tape/cold storage/cloud archives
  • WekaIO (Neural Mesh):
    • Software-defined fabric connecting AI compute, network, and data.
    • Designed to scale with problem size and AI innovation.
    • Supports x86 and ARM architectures, on-premises and cloud deployments.
    • Engineered for both model training and inference, adapting to evolving AI needs.
    • Focus on real-time responsiveness and managing explosive data growth.
  • Scality (Object Storage):
    • Targets large-scale deployments (petabytes to exabytes).
    • S3-compatible, simplifying access for developers.
    • Automated data lifecycle management, security features.
    • Object storage is moving up tiers, increasing performance via NVMe and removing software stack.
    • GPU direct support for fast performance.
    • Used for primary data lakes, backup and immutable backup, and cold storage.
  • Kioxia (Flash Memory & SSDs):
    • Focus on high-performance and high-capacity SSDs.
    • PCIe 5.0 SSDs become Ubiquitous for Training.
    • E3.S and E1.S form factors improve efficiency.
    • Bixate Technology enhances flash density, power efficiency, and performance.
    • Innovations in 3D NAND are key to delivering high-capacity drives (128TB and beyond).
  • Supermicro:
    • Offers fully integrated and balanced end-to-end AI solutions.
    • DCBS Framework that includes rack-level and storage-level building blocks.
    • AI servers utilize NVIDIA AI products (HGX, BG, NVL series) and Grace CPU.
    • Tiered storage architecture using WekaIO (all-flash), Scality (object storage), and Kioxia SSDs.
    • Focus on performance, capacity, scalability, and cost-effectiveness.
  • Considerations for Future Storage Architectures:
    • Scalability: Solutions need to scale efficiently to handle massive data growth.
    • Low Latency: Minimizing latency is crucial for inference and RAG applications.
    • Flexibility: Architectures need to adapt to evolving AI workloads and technologies.
    • Cost-Effectiveness: Balancing performance and capacity with cost considerations.
    • Integration: Seamless integration of different storage tiers and technologies.
  • Balanced and Fully Integrated:
    • Systems have to be unified.
    • Have to be end to end optimized, beyond the physical components.
    • Must be scalable, and modular.
    • Resource optimizations, can't have over or underutilized systems.
    • Must balance cost and performance.