Company Previews New Solution With NVIDIA, Arm, and Supermicro at Supercomputing 2024 to Deliver Exceptional Performance Density and Energy Savings for Enterprise AI Deployments
ATLANTA and CAMPBELL, Calif., Nov. 20, 2024 /PRNewswire/ -- From Supercomputing 2024: WEKA, the AI-native data platform company, previewed the industry's first high-performance storage solution for the NVIDIA Graceâ„¢ CPU Superchip. The solution will run on a powerful new storage server from Supermicro powered by WEKA® Data Platform software and Arm® Neoverseâ„¢ V2 cores using the NVIDIA Grace CPU Superchip and NVIDIA ConnectX-7 and NVIDIA BlueField-3 networking to accelerate enterprise AI workloads with unmatched performance density and power efficiency.
Fueling the Next Generation of AI Innovation
Today's AI and high-performance computing (HPC) workloads demand lightning-fast data access, but most data centers face increasing space and power constraints.
NVIDIA Grace integrates the level of performance offered by a flagship x86-64 two-socket workstation or server platform into a single module. Grace CPU Superchips are powered by 144 high-performance Arm Neoverse V2 cores that deliver 2x the energy efficiency of traditional x86 servers. NVIDIA ConnectX-7 NICs and BlueField-3 SuperNICs feature purpose-built RDMA/RoCE acceleration, delivering high-throughput, low-latency network connectivity at up to 400Gb/s speeds. The combination of the WEKA Data Platform's revolutionary zero-copy software architecture running on the Supermicro Petascale storage server minimizes I/O bottlenecks and reduces AI pipeline latency to significantly enhance GPU utilization and accelerate AI model training and inference to dramatically improve time to first token, discoveries, and insights while reducing power consumption and associated costs.
Key benefits of the solution include:
"AI is transforming how enterprises around the world innovate, create, and operate, but the sharp increase in its adoption has drastically increased data center energy consumption, which is expected to double by 2026, according to the International Atomic Agency," said Nilesh Patel, chief product officer at WEKA. "WEKA is excited to partner with NVIDIA, Arm, and Supermicro to develop high-performance, energy-efficient solutions for next-generation data centers that drive enterprise AI and high-performance workloads while accelerating the processing of large amounts of data and reducing time to actionable insights."
"WEKA has developed a powerful storage solution with Supermicro that integrates seamlessly with the NVIDIA Grace CPU Superchip to improve the efficiency of at-scale, data-intensive AI workloads. The solution will provide fast data access while reducing energy consumption, enabling data-driven organizations to turbocharge their AI infrastructure," said Ivan Goldwasser, director of data center CPUs at NVIDIA.
"Supermicro's upcoming ARS-121L-NE316R Petascale storage server is the first storage optimized server using the NVIDIA Grace Superchip CPU," said Patrick Chiu, Senior Director, Storage Product Management, Supermicro. "The system design features 16 high-performance Gen5 E3.S NVMe SSD bays along with three PCIe Gen 5 networking slots, which support up to two NVIDIA ConnectX 7 or BlueField-3 SuperNIC networking adapters and one OCP 3.0 network adapter. The system is ideal for high-performance storage workloads like AI, data analytics, and hyperscale cloud applications. Our collaboration with NVIDIA and WEKA has resulted in a data platform enabling customers to make their data centers more power efficient while adding new AI processing capabilities."
"AI innovation requires a new approach to silicon and system design that balances performance with power efficiency. Arm is proud to be working with NVIDIA, WEKA and Supermicro to deliver a highly performant enterprise AI solution that delivers exceptional value and uncompromising energy efficiency," said David Lecomber, director for HPC at Arm.
The storage solution from WEKA and Supermicro using NVIDIA Grace CPU Superchips will be commercially available in early 2025. Supercomputing 2024 attendees can visit WEKA in Booth #1931 for more details and a demo of the new solution.
About WEKA
WEKA is architecting a new approach to the enterprise data stack built for the AI era. The WEKA® Data Platform sets the standard for AI infrastructure with a cloud and AI-native architecture that can be deployed anywhere, providing seamless data portability across on-premises, cloud, and edge environments. It transforms legacy data silos into dynamic data pipelines that accelerate GPUs, AI model training and inference, and other performance-intensive workloads, enabling them to work more efficiently, consume less energy, and reduce associated carbon emissions. WEKA helps the world's most innovative enterprises and research organizations overcome complex data challenges to reach discoveries, insights, and outcomes faster and more sustainably – including 12 of the Fortune 50. Visit www.weka.io to learn more or connect with WEKA on LinkedIn, X, and Facebook.
WEKA was recognized as a Visionary in the 2024 Gartner® Magic Quadrantâ„¢ for File and Object Storage Platforms - read the report.
WEKA and the WEKA logo are registered trademarks of WekaIO, Inc. Other trade names used herein may be trademarks of their respective owners.
** The press release content is from PR Newswire. Bastille Post is not involved in its creation. **
WEKA Unveils Industry's First AI Storage Cluster Built On NVIDIA Grace CPU Superchips
WARRP Reference Architecture Provides Comprehensive Modular Solution That Accelerates the Development of RAG-based Inferencing Environments
ATLANTA and CAMPBELL, Calif., Nov. 20, 2024 /PRNewswire/ -- From Supercomputing 2024: WEKA, the AI-native data platform company, debuted a new reference architecture solution to simplify and streamline the development and implementation of enterprise AI inferencing environments. The WEKA AI RAG Reference Platform (WARRP) provides generative AI (GenAI) developers and cloud architects with a design blueprint for the development of a robust inferencing infrastructure framework that incorporates retrieval-augmented generation (RAG), a technique used in the AI inference process to enable large language models (LLMs) to gather new data from external sources.
The Criticality of RAG in Building Safe, Reliable AI Operations
According to a recent study of global AI trends conducted by S&P Global Market Intelligence, GenAI has rapidly emerged as the most highly adopted AI modality, eclipsing all other AI applications in the enterprise.[1]
A primary challenge enterprises face when deploying LLMs is ensuring they can effectively retrieve and contextualize new data across multiple environments and from external sources to aid in AI inference. RAG is the leading technique for AI inference, and it is used to enhance trained AI models by safely retrieving new insights from external data sources. Using RAG in the inferencing process can help reduce AI model hallucinations and improve output accuracy, reliability and richness, reducing the need for costly retraining cycles.
However, creating robust production-ready inferencing environments that can support RAG frameworks at scale is complex and challenging, as architectures, best practices, tools, and testing strategies are still rapidly evolving.
A Comprehensive Blueprint for Inferencing Acceleration
With WARRP, WEKA has defined an infrastructure-agnostic reference architecture that can be leveraged to build and deploy production-quality, high-performance RAG solutions at scale.
Designed to help organizations quickly build and implement RAG-based AI inferencing pipelines, WARRP provides a comprehensive blueprint of modular components that can be used to quickly develop and deploy a world-class AI inference environment optimized for workload portability, distributed global data centers and multicloud environments.
The WARRP reference architecture builds on WEKA® Data Platform software running on an organization's preferred cloud or server hardware as its foundational layer. It then incorporates class-leading enterprise AI frameworks from NVIDIA — including NVIDIA NIMâ„¢ microservices and NVIDIA NeMoâ„¢ Retriever, both part of the NVIDIA AI Enterprise software platform — advanced AI workload and GPU orchestration capabilities from Run:ai and popular commercial and open-source data management software technologies like Kubernetes for data orchestration, and Milvus Vector DB for data ingestion.
"As the first wave of generative AI technologies began moving into the enterprise in 2023, most organizations' compute and data infrastructure resources were focused on AI model training. As GenAI models and applications have matured, many enterprises are now preparing to shift these resources to focus on inferencing but may not know where to begin," said Shimon Ben-David, chief technology officer at WEKA. "Running AI inferencing at scale is extremely challenging. We are developing the WEKA AI RAG Architecture Platform on leading AI and cloud infrastructure solutions from WEKA, NVIDIA, Run:ai, Kubernetes, Milvus, and others to provide a robust production-ready blueprint that streamlines the process of implementing RAG to improve the accuracy, security and cost of running enterprise AI models."
WARRP delivers a flexible, modular framework that can support a variety of LLM deployments, offering scalability, adaptability, and exceptional performance in production environments. Key benefits include:
"As AI adoption accelerates, there is a critical need for simplified ways to deploy production workloads at scale. Meanwhile, RAG-based inferencing is emerging as an important frontier in the AI innovation race, bringing new considerations for an organization's underlying data infrastructure," said Ronen Dar, chief technology officer at Run:ai. "The WARRP reference architecture provides an excellent solution for customers building an inference environment, providing an essential blueprint to help them develop quickly, flexibly and securely using industry-leading components from NVIDIA, WEKA and Run:ai to maximize GPU utilization across private, public and hybrid cloud environments. This combination is a win-win for customers who want to outpace their competition on the cutting edge of AI innovation."
"Enterprises are looking for a simple way to embed their data to build and deploy RAG pipelines," said Amanda Saunders, director of Enterprise Generative AI software, NVIDIA. "Using NVIDIA NIM and NeMo with WEKA, will give enterprise customers a fast path to develop, deploy and run high-performance AI inference and RAG operations at scale."
The first release of the WARRP reference architecture is now available for free download. Visit https://www.weka.io/resources/reference-architecture/warrp-weka-ai-rag-reference-platform/ to obtain a copy.
Supercomputing 2024 attendees can visit WEKA in Booth #1931 for more details and a demo of the new solution.
Supporting AI Cloud Service Provider Quotes
Applied Digital
"As companies increasingly harness advanced AI and GenAI inferencing to empower their customers and employees, they recognize the benefits of leveraging RAG for greater simplicity, functionality and efficiency," said Mike Maniscalco, chief technology officer at Applied Digital. "WEKA's WARRP stack provides a highly useful reference framework to deliver RAG pipelines into a production deployment at scale, supported by powerful NVIDIA technology and reliable, scalable cloud infrastructure."
Ori Cloud
"Leading GenAI companies are running on Ori Cloud to train the world's largest LLMs and achieving maximum GPU utilization thanks to our integration with the WEKA Data Platform," said Mahdi Yahya, founder and chief executive officer at Ori Cloud. "We look forward to working with WEKA to build robust inference solutions using the WARRP architecture to help Ori Cloud customers maximize the benefits of RAG pipelines to accelerate their AI innovation."
Yotta
"To run AI effectively, speed, flexibility, and scalability are required. Yotta's AI solutions, powered by NVIDIA GPUs and built on the WEKA Data Platform, are helping organizations to push the boundaries of what's possible in AI, offering unparalleled performance and flexible scale," said Sunil Gupta, chief executive officer at Yotta. "We look forward to collaborating with WEKA to further enhance our Inference-as-a-Service offerings for natural-language processing, computer vision, and generative AI leveraging the WARRP reference architecture and NVIDIA NIM microservices."
About WEKA
WEKA is architecting a new approach to the enterprise data stack built for the AI era. The WEKA® Data Platform sets the standard for AI infrastructure with a cloud and AI-native architecture that can be deployed anywhere, providing seamless data portability across on-premises, cloud, and edge environments. It transforms legacy data silos into dynamic data pipelines that accelerate GPUs, AI model training and inference, and other performance-intensive workloads, enabling them to work more efficiently, consume less energy, and reduce associated carbon emissions. WEKA helps the world's most innovative enterprises and research organizations overcome complex data challenges to reach discoveries, insights, and outcomes faster and more sustainably – including 12 of the Fortune 50. Visit www.weka.io to learn more or connect with WEKA on LinkedIn, X, and Facebook.
WEKA and the WEKA logo are registered trademarks of WekaIO, Inc. Other trade names used herein may be trademarks of their respective owners.
[1] 2024 Global Trends in AI, September 2024, S&P Global Market Intelligence
** The press release content is from PR Newswire. Bastille Post is not involved in its creation. **
WEKA Debuts New Solution Blueprint to Simplify AI Inferencing at Scale