[@ServeTheHomeVideo] Generative AI Enterprise Infrastructure - Open Storage Summit 2025 Session 6
Link: https://youtu.be/MIFhW42Db20
Short Summary
Number One Takeaway: Experimentation and learning by doing are crucial for navigating the complexities of generative AI deployment and realizing its full potential, especially considering the rapid evolution of the field.
Executive Summary: Enterprises are increasingly exploring generative AI, but face challenges in moving from proof of concept to production due to cost, data privacy concerns, and the overwhelming number of available models. Collaboration and simplification among infrastructure providers are helping enterprises build custom, on-premise solutions.
Key Quotes
Here are 5 direct quotes from the YouTube video transcript that represent particularly valuable insights, interesting data points, or strong opinions:
- "When we talk about generative AI, we're referring to systems that don't just analyze data, they create it."
- "Enterprise AI often requires deep integration, more control and alignment with business outcomes. It's not just chat GPT with a uh business skin on it. It's AI purpose-built for enterprisegrade reliability, performance and of the data platform, data privacy and of course ROI."
- "Estimates vary, but global market for generative AI is projected to exceed 1.3 trillion by 2032. growing at a kegger of north of 40%."
- "Between the AI accelerator s be it GPUs, TPUs or NPUs, CPUs, right? And getting an AI uh application to production at enterprisegrade, there is uh there's a there's a big gap. There's lots of things to be filled in to make sure it runs the way you want to in an enterprise context."
- "With nutanics unified storage we were able to scale to over a thousand GPUs and saturate those GPUs. The next vendor was uh was less than half of that."
Detailed Summary
Okay, here's a detailed summary of the YouTube video transcript, focusing on key topics, arguments, and information:
-
Introduction to Generative AI in the Enterprise:
- Rob Stretch (The Cube Research) introduces the session focusing on storage infrastructure for enterprise generative AI applications.
- Generative AI (GenAI) is described as a foundational shift in how intelligence is built and deployed.
-
Defining Generative AI vs. Predictive AI:
- Generative AI: Creates new data (code, text, designs, decisions) that is statistically coherent and contextually rich.
- Predictive AI: Excels at classification, regression, and forecasting (e.g., fraud detection, sales predictions), but doesn't create new content.
- Generative AI uses transformer architectures, massive datasets, and self-supervised learning, whereas predictive AI often relies on classical deep learning (conversational neural networks).
-
Enterprise AI Context:
- Enterprise AI is tailored for business processes, decision automation, domain-specific reasoning, and governance.
- Integrated into ERP, CRM, supply chain, and compliance workflows.
- Requires deep integration, control, and alignment with business outcomes; not just consumer-grade GenAI with a business veneer.
- Focus on enterprise-grade reliability, performance, data privacy, and ROI.
-
Deployment Models (GenAI):
- Public cloud (leading early adoption due to scale, elasticity, and APIs).
- Private cloud environments.
- On-premise (especially in regulated sectors or where latency and data control are critical).
- Growing demand for hybrid and edge AI deployment models (manufacturing, finance, government).
- Investment in AI-optimized systems and inference accelerators is increasing.
-
Agentic AI:
- Agentic AI is the next evolution beyond GenAI, embedding autonomous decision-making capabilities.
- Systems not only generate content but also take actions to achieve goals without constant human intervention.
- Scales GenAI capabilities into more complex workflows involving predictive analytics and real-time operational adjustments.
-
Market Projections:
- Global market for GenAI projected to exceed $1.3 trillion by 2032, growing at a CAGR of over 40%.
- Enterprise use cases (code generation, customer support, legal drafting, analytics augmentation) will drive significant value.
-
Super Micro's View (Wendell):
- Discusses the process of developing and deploying enterprise GenAI applications.
- References IDC research commissioned by Super Micro.
- Breaks down the process into three phases:
- Proof of Concept: Prototyping, use case analysis, stakeholder alignment, defining data strategy (identifying data sources, ingestion into data lakes). Result is a working prototype model.
- Production Level Training: Using more data and enhanced models for production-level training. Requires planning a deployment strategy (on-premise, colo, public/private cloud). Key considerations are cost, speed, security, and data governance.
- Inference: Implementing the model into inference. Analyze the maximum concurrent query rate to size the equipment and infrastructure appropriately. Determine if the training infrastructure is also adequate for inference (may need dedicated infrastructure for high volumes).
- Describes the GenAI solution stack (data management layer, compute resources, models, software layer, application).
- Data Management: All-flash storage optimized servers and high capacity disc. Object storage from MinIO.
- Compute Resources: Dedicated GPU servers (Super Micro has a variety with up to 8-way GPU servers) and compute servers (Intel Xeon processors) for data processing.
-
Nutanix AI Platform (Kanjin):
- Nutanix is a hybrid cloud provider with a platform that works across on-premises, edge, and public cloud locations (Azure, AWS, soon Google Cloud).
- Nutanix Kubernetes Platform (from D2IQ acquisition) offers a mature container management platform.
- Focus on filling the gap between AI accelerators (GPUs, TPUs, NPUs, CPUs) and enterprise-grade AI applications in production.
- Addresses resilience, security, governance, and cost management for enterprise applications.
- Provides an operational Kubernetes platform to manage complex Kubernetes deployments.
- Nutanix cloud infrastructure provides resilient infrastructure everywhere.
- Nutanix enterprise AI provides centralized inferencing services and access to models from Hugging Face and Nvidia's NGC.
- Provides solutions for structured and unstructured data via Database Services and Unified Storage.
- Performance benchmark (MLCommons) showed Nutanix Unified Storage scaling to over 1,000 GPUs with high saturation for an image classification workload (files workload).
-
MinIO's Role in the AI Data Infrastructure (Keith):
- MinIO is a software-defined object storage leader used by a significant portion of Fortune 500/100 companies.
- AIStore is MinIO's commercial offering for generative AI.
- Visual depiction of an AI data infrastructure from a workloads perspective.
- AI store supports hugging face and implements hugging face APIs. Saves initial and subsequent models to AI store and prevents accidental pushing of models back to the hub.
- Ideal backbone for a data lakehouse and implements table formats like Iceberg, Hudi and Delta Lake.
- Offers S3 compatibility for MLOps vendors, versioning of data sets, models, and training artifacts (integrations with Kubeflow, MLFlow, MLRun).
- Ingestion via a cluster of CPUs (don't skimp) with optional data cleaning/pre-processing. Supports S3, SFTP, and Kafka.
- Training with a cluster of GPUs (need good network). Implemented S3 over RDMA, GPU direct for high speed data transfer.
- Vector databases are essential for retrieval augmented generation (RAG), requiring storage for custom corpus documents and index files (compatibility with LanceDB, Milvus, Weaviate).
- Inference servers for LLMs need KV cache (MinIO working on Nixl-compliant KV cache). Consider inference logs for tracking requests/responses and improving model accuracy.
-
Intel's Contribution (Gary):
- Latest generation of Xeon data center processors for building storage infrastructure.
- Generative AI is growing rapidly, requiring large data bandwidth, high throughput, and low latency data connections.
- Intel Xeon 6 processor meets demands of high bandwidth and IOPS.
- Supports MRDIMs with 2x DDR5 bandwidth.
- Some LLMs and use cases can be accelerated using the Xeon 6 processor due to on-board accelerators.
-
Customer Challenges in Getting to Production:
- Cost: High cost and unpredictability of cost (especially with cloud token-based pricing).
- Data Privacy: Concerns about data staying within the enterprise and needing air-gapped solutions.
- Too Much Choice/Change: Vast number of models on Hugging Face, difficulty in choosing the right model, constant optimization possibilities.
- Utilization of Existing Hardware: Wanting to leverage existing CPU infrastructure.
- Data center refresh: Refresh from old server to new server can free up space and costs.
-
Future of AI & Infrastructure:
- Customization of models (especially open-source models) for specific business needs.
- Companies getting smarter about right-sizing AI for their needs (smaller models, optimized infrastructure).
-
Nutanix Customer Value:
- Customers want control over data, costs, and digital sovereignty.
- Predictable cost models for inferencing, as opposed to cloud token-based pricing.
-
AI-Ready Data (MinIO):
- AI-ready data infrastructure provides AI-ready data.
- Data should be centralized under one roof.
- Infrastructure needs to scale easily to handle the explosion of unstructured data.
-
Advice to Organizations on the GenAI/Agentic Path:
- Experiment (Gary): Take the time to experiment to uncover tradeoffs in performance, power, and cost. Predictability is difficult until you develop a POC.
- Centralize Data (Keith): Get your data under one roof on a scalable solution.
- Use Enterprise Checklists, Start Phased Rollout (Kanjin): Treat GenAI deployment like any other enterprise application. Phase the rollout to learn from each phase. Tap into the creativity of your users.
- Learn by Doing (Wendell): We've come a long way in ease of use for developing and deploying GenAI. Developments are still happening. Develop prototypes to better understand user needs and avoid wasting time on things that don't work.
