[@ServeTheHomeVideo] Agentic AI Storage Solutions - Open Storage Summit 2025 Session 2
Link: https://youtu.be/TaYGRMw7pm8
Short Summary
Number One Takeaway: Agentic AI requires a paradigm shift in storage infrastructure to prioritize high-speed, low-latency access, scalability, data intelligence, and power efficiency, demanding a holistic approach to system design and infrastructure management.
Executive Summary: Enterprises adopting agentic AI must re-evaluate their storage architecture to meet the demands of autonomous systems that require rapid data access and intelligent data management. This involves adopting scalable, high-performance solutions and leveraging software-defined approaches.
Key Quotes
Here are five quotes from the transcript that represent valuable insights:
-
"Aentic AI introduces a new class of challenges, especially when it comes to the storage infrastructure. These systems demand massive high spa high-speed low latency storage paired with scalable flexible architectures that can seamlessly manage diverse data types integrated with compute and networking layers and still meet strict performance, security, data governance and cost efficiency requirements." - This quote highlights the unique demands agentic AI places on storage infrastructure, emphasizing the need for speed, scalability, and integrated management.
-
"The messaging that we are hearing and I think you hit it on on the head is that 2025 is really the year of inferencing and agentic AI because we were actually underalling severely underalling the amount of compute and storage that inference and agentic AI was going to translate into." - This quote underscores the industry's evolving understanding of the resource demands of agentic AI, particularly the underestimation of compute and storage needs, suggesting a significant shift in planning and investment.
-
"Bottom line the story is storage is a critical enabler for uh agentic AI. We need it because uh we can get to the force token what is called time to force token reduce latency and really cut down on uh unnecessary uh GPU computation by re reestablishing the KV cache every time you know we have to start again." - This quote emphasizes the pivotal role of storage in enabling faster token generation and reducing GPU workload by efficiently managing the KV cache.
-
"...where you can see the storage software really make a difference is the ability to to a accelerate the GPU and make sure that you're um you're running it at high utilization to ensure you get the best ROI and the lowest latency." - This highlights the crucial role of storage software in maximizing GPU utilization, leading to better ROI and lower latency, which are critical for agentic AI applications.
-
"So AMD is well positioned in all the stages of the AI storage but we are not doing this alone. So today here we're with our partners and we build tested optimized and the scalable solutions from edge to cloud from research to the real world deployment." - This emphasizes the need for partnership to achieve scalable solutions.
Detailed Summary
Here's a detailed summary of the YouTube video transcript, focusing on the key topics and arguments, and excluding advertisements:
Key Topics & Arguments:
-
The Rise of Agentic AI and its Infrastructure Demands:
- Agentic AI is a pivotal moment in enterprise AI, characterized by autonomous decision-making, multi-step tasks, and complex interactions.
- It requires a new class of infrastructure, particularly for storage.
- Traditional AI storage solutions are being pushed to their limits.
-
Core Components of an Agentic AI System:
- Large Language Models/Generative AI models: "Brains" of the operation.
- Key Value Cache (KV Cache): Short-term memory for immediate results and memory tokens, enabling context retention and faster processing.
- Control Flow Layer: Orchestrates how agents plan, retrieve information (e.g., Retrieval Augmented Generation or RAG), and execute tasks, often using vector databases and knowledge graphs.
- Tool Invocation: Agents call external APIs or internal services for real-time task execution.
- Feedback Loops: Continuous learning and improvement based on real-world outcomes and business goals.
- Supporting capabilities: Model hosting, vectorization services, aging catalogues for reusable modules.
-
Storage's Critical Role in Agentic AI:
- Massive, high-speed, low-latency storage is essential.
- Scalable and flexible architectures are needed to manage diverse data types.
- Storage infrastructure must integrate seamlessly with compute and networking layers.
- Requirements include performance, security, data governance, and cost efficiency.
- Storage enables "time to first token" reduction, by allowing faster swapping and retrieval of data in the KV cache and data lakes.
-
AMD's Perspective:
- Agentic AI requires systems that can access, use, and update data rapidly.
- The AI data path involves three key stages:
- Data Lake: Fast loading and high bandwidth are critical for sequential data (documents, photos, videos).
- Data Preparation: Fast I/O, high memory bandwidth, and CPU orchestration are needed to clean, label, cache, and transform data in memory.
- Inference: Very low latencies and high memory traffic are essential for real-time responses.
- AMD's solutions include:
- Epyc CPUs: Large PCIe lanes, high core counts, and strong memory controls for pre-processing and storage I/O management.
- GPU Families: Direct data path from storage to GPU memory for faster data movement with less CPU usage.
- Smart NICs and DPUs: Secure data transfer and offload storage tasks.
- ROCm software stack: Connects everything to leverage AMD hardware.
- AMD is positioned to support all stages of AI storage, collaborating with partners.
-
SanDisk's Perspective:
- Agentic AI has "agency" – it can act autonomously and make decisions with minimal human intervention.
- Inference has evolved from perception to generative AI to complex reasoning with Agentic AI, increasing complexity by 100x.
- 2025 is the year of inferencing and Agentic AI.
- Inference is a two-stage process:
- Prefill: Build context in the KV cache using input tokens.
- Decode: Output tokens and update the KV cache, enabling auto-regression.
- Complex use cases (multi-turn, multi-step, multi-user, multi-tenant, multimodal) lead to a multiplication of data sets that are hard to hold in HBM or system memory.
- Swapping KV caches and vector lakes in/out of storage is faster than rebuilding them.
- Storage is a critical enabler for Agentic AI, reducing latency and unnecessary GPU computation.
- Enterprises are moving cold data into larger, flash-based data lakes.
- Two sets of SSD requirements: compute-intensive (high IOPS, random reads) and capacity-density (sequential workloads)
-
DDN's Perspective:
- DDN's experience in HPC translates well to AI.
- Need for traditional storage software to step up for high performance and maximized GPU utilization.
- Temperature of data is rising, requiring active management of the entire data set.
- DDN's Exa and Infinia software take advantage of diverse protocols, media types, and data sizes.
- Software accelerates the GPU, ensures high utilization for the best ROI and lowest latency.
- DDN has a lowest-latency object store.
- Seamlessly handle multi-protocol scenarios and accelerate the network.
- Minimize data transition via SQL querying.
- Example of a 22x speedup in a RAG application by replacing AWS object storage with DDN's Infinia.
- Granular multi-tenancy architecture with SLAs and quality of service.
-
Super Micro's Perspective:
- Agentic AI can be deployed at various scales (cloud, on-prem, edge).
- Increased demand for performance, scale, and data transfer capabilities.
- Super Micro provides infrastructure solutions (deep data center building blocks).
- Critical building blocks include compute, storage, networking fabric, power, and cooling.
- Offers rack-scale solutions that simplify deployment and reduce overall cost.
- Reference architecture featuring AMD-based GPU systems, 800G network switches, and pre-validated storage solutions.
- Partnership with DDN to offer large-scale storage solutions (60 PB per rack).
-
Power Consumption Consideration:
- Power is top of mind. GPUs are 10x more power hungry than CPUs.
- Power rates are different across different geographies,
- It is more than 60% of a data center's ongoing opex.
- The new metric is IOPS per watt and throughput per watt.
- SanDisk offers different power states
-
Key Advice:
- AI is a forcing function in this rearchitecture of the data center.
- Organizations should consider scalability and openness.
- Think about upgrade paths and space limitations.
- Be mindfull of data movement.
- Consider a software-defined approach to managing data.
- Broaden view to managing data across multiple sites and interconnect data centers.
- Evaluate all available infrastructure options (CPU, GPU, NIC, DPU, Storage HW and SW)
In essence, the video argues that Agentic AI represents a significant evolution in AI, requiring a fundamental rethinking of storage infrastructure. The panelists emphasize the need for high-performance, scalable, and efficient storage solutions that seamlessly integrate with other components of the AI ecosystem, and highlight the importance of vendor collaboration and software-defined approaches.
