Skip to main content

[@ServeTheHomeVideo] Storage to Enable Inference at Scale - Open Storage Summit 2025 Session 4

· 5 min read

@ServeTheHomeVideo - "Storage to Enable Inference at Scale - Open Storage Summit 2025 Session 4"

Link: https://youtu.be/GRWofZlr_3A

Short Summary

Number One Takeaway:

Organizations need to strategically architect their AI infrastructure for inference at scale, focusing on a holistic, integrated approach that considers the entire ecosystem – from high-performance compute and networking to storage – to efficiently manage data flow and minimize bottlenecks.

Executive Summary:

The Super Micro Open Storage Summit emphasizes the growing importance of inference and the need for scalable, high-throughput storage solutions to support real-time, model-driven decisions. The discussions highlight emerging technologies like Nvidia's Nemo Retriever, tiered storage architectures, and the benefits of object storage, underscoring the requirement for validated, GPU-optimized infrastructure that simplifies deployment and ensures efficient data pipelines. As AI evolves toward agentic models, businesses must prioritize energy efficiency, flexible data management, and a cohesive integration of hardware and software to meet the increasing demands of advanced inference workloads.

Key Quotes

Here are five direct quotes from the transcript that represent valuable insights and interesting data points:

  1. "Inference isn't just about serving a single model. It's about running thousands or millions of model driven decisions in real time across edge, data center, and cloud environments. And behind every low latency response is a silent workhorse, the storage infrastructure." (Rob Stretch)
  2. "Earlier AI factories used to focus primarily on training. Customers would deploy all of their infrastructure primarily for training of foundational models, fine-tuning them and and deploying them elsewhere. But now the expanded use of AI factories includes production serving as in inference and data prep as well." (Hares Aurora)
  3. "Enterprise data especially the unstructured enterprise data is expected to grow from 14 zerob." (Hares Aurora)
  4. "What hammerspace has done is that we have eliminated data gravity." (Tony Assaro)
  5. "There's no AI without data. There's no data without infrastructure. And I might add to that, there's no infrastructure without efficiency" (Ace Striker)

Detailed Summary

Here's a detailed summary of the YouTube video transcript, presented in bullet points, focusing on key topics, arguments, and information discussed:

Key Topics:

  • AI Inference at Scale: The primary focus is on scaling AI inference effectively across different environments (edge, data center, cloud).
  • Storage Considerations for Inference: The importance of storage infrastructure for low-latency, high-volume, unstructured data access in inference workloads.
  • Ecosystem Collaboration: Emphasis on the collaborative approach of Super Micro, Nvidia, and other partners to deliver integrated AI solutions.
  • Agentic AI: Focus on how inference is the the backbone for AI Agents.

Arguments & Information:

  • Shift from Training to Inference: The discussion highlights a fundamental shift in AI, moving from primarily focusing on training massive models to efficiently deploying and scaling those models for inference in production.
  • Inference Workload Characteristics: Inference workloads are characterized by high volume, high variety, and unstructured data, including logs, images, video, and sensor outputs, often stored as files and objects.
  • Need for Flexible, High-Throughput Storage: Efficiently feeding data to GPU-centric environments at scale demands flexible, high-throughput, and tightly integrated storage solutions.
  • Importance of New Storage Protocols and Data Services: Emerging storage protocols and data services are crucial for enabling efficient access to unstructured data across distributed inference frameworks, reducing bottlenecks, and achieving real-time performance.
  • Tiered Storage Architectures: Tiered storage architectures are gaining prominence, balancing performance, capacity, and cost to optimize inference workflows.
  • Object Storage: Object storage provides massive scalability and durability for AI workflows.
  • Parallel File Systems: Parallel file systems ensure low-latency access for performance-critical workloads.
  • Validated GPU-Optimized Infrastructure: Organizations are seeking easy-to-deploy, validated GPU-optimized infrastructure with integrated solutions tailored to emerging data pipelines.
  • Storage as a Critical Component: The session emphasizes that storage is not just an afterthought but a vital component for scaling unstructured data pipelines to match the speed of AI.
  • Nvidia's Role: Nvidia is focusing on AI factories and technologies like Nemo Retriever (NIMS) and Nvidia Dynamo for scaling generative AI, with storage playing a crucial role in these initiatives.
  • Nvidia Nemo Retriever: A suite of technologies that accelerate the processing of unstructured data for use as context data for LLMs and video language models.
  • Nvidia Dynamo: Nvidia's recommended solution for disagregated inference, featuring a modular, open-source architecture, KV cache management, and the NIXL transfer library.
  • Hammerspace's AI Data Platform: Hammerspace offers a standard-based parallel file system, multi-site data access, and data orchestration capabilities to address the challenges of distributed data and AI anywhere.
  • Claudian's Object Storage: Claudian is an object storage vendor focused on optimizing object storage for AI workloads, including integration with Nvidia's GPU Direct for object storage.
  • Solidime's SSD Solutions: Solidime manufactures SSDs optimized for AI, with high-performance SSDs for GPU servers and high-density QLC SSDs for efficient storage.
  • Super Micro's Role as a Solution Provider: Super Micro integrates various technologies and partnerships to offer complete solutions for organizations looking to capitalize on inference at scale.
  • Power Efficiency: Optimizing the AI data center for energy consumption is key to scaling future AI infrastructures.
  • Collaboration: Importance of an ecosystem that contains AI companies, software companies, and hardware companies working together to better the AI market space.