Tag: LPU

¿Muy poco y demasiado tarde? Bertha LPU se une a la LPU ultrarrápida de Groq a medida que crece el desafío al gigante de GPU de Nvidia

[ad_1]

Una startup de inteligencia artificial en Corea del Sur Hiper Axel Se asoció con la empresa de SoC basada en plataforma y el diseñador de ASIC SEMIFIVE en enero de 2024 para desarrollar Bertha LPU.

Diseñado para la inferencia LLM, Bertha ofrece “bajo costo, baja latencia y características específicas de dominio”, con el objetivo de reemplazar las GPU de “alto costo y baja eficiencia”. SEMIFIVE informa que el trabajo ya se ha completado y que el procesador, diseñado con tecnología de 4 nm, se producirá en masa a principios de 2026.

HyperAccel afirma que Bertha puede ofrecer hasta el doble de rendimiento y una relación precio-rendimiento 19 veces mejor que la de una supercomputadora típica, pero se enfrenta a una dura competencia en un mercado donde… NVIDIASus GPU están muy bien establecidas.

Enfrentando desafíos

“Estamos encantados de trabajar con SEMIFIVE, un proveedor líder de plataformas SoC y soluciones integrales de diseño ASIC, para desarrollar Bertha para la producción en masa”, dijo Ju Young Kim, director ejecutivo de HyperAccel. “Al colaborar con SEMIFIVE, nos complace ofrecer a los clientes semiconductores de inteligencia artificial que brindan funciones LLM más rentables y energéticamente eficientes en comparación con las plataformas GPU. Este avance reducirá significativamente los gastos operativos del centro de datos y expandirá nuestro negocio a otras industrias que requieren MBA. .

Groq, un competidor de IA con sede en Silicon Valley y dirigido por el ex director ejecutivo de…Google El ingeniero y director ejecutivo Jonathan Ross ya ha logrado grandes avances Con su propio producto LPUcentrándose en la inferencia de inteligencia artificial de alta velocidad.

La tecnología de Groq, que proporciona inferencia local y en la nube a escala para aplicaciones de IA, ya ha encontrado una gran audiencia con más de 525.000 desarrolladores que utilizan LPU desde su lanzamiento en febrero. La llegada tardía de Bertha puede ponerla en desventaja.

Brandon Chu, director ejecutivo y cofundador de SEMIFIVE, es más optimista sobre las posibilidades de Bertha. “HyperAccel es una empresa que cuenta con la tecnología LPU más eficiente y escalable para estudiantes de MBA. Con la demanda de computación LLM creciendo exponencialmente, HyperAccel tiene el potencial de convertirse en una nueva fuerza en la infraestructura de procesadores global”, afirmó.

El enfoque de Bertha en la eficiencia podría atraer a empresas que buscan alternativas para reducir los costos operativos, pero con el dominio incomparable de Nvidia, el producto HyperAccel puede encontrarse luchando por un nicho en un espacio ya abarrotado, en lugar de convertirse en un líder en inteligencia artificial.

Más de TechRadar Pro

[ad_2]

Source Article Link

Tags Bertha, crece, demasiado, desafío, gigante, GPU, Groq, LPU, medida, muy, NVIDIA, POCO, tarde, ultrarrápida, une

Featured

This startup wants to take on Nvidia with a server-on-a-chip to eliminate what it calls an already flawed system — faster GPU, CPU, LPU, TPU or NIC will not deliver the leap that many firms are aiming for

[ad_1]

According to Israeli startup NeuReality, many AI possibilities aren’t fully realized due to the cost and complexity of building and scaling AI systems.

Current solutions are not optimized for inference and rely on general-purpose CPUs, which were not designed for AI. Moreover, CPU-centric architectures necessitate multiple hardware components, resulting in underutilized Deep Learning Accelerators (DLAs) due to CPU bottlenecks.

NeuReality’s answer to this problem is the NR1AI Inference Solution, a combination of purpose-built software and a unique network addressable inference server-on-a-chip. NeuReality says this will deliver improved performance and scalability at a lower cost alongside reduced power consumption.

An express lane for large AI pipelines

“Our disruptive AI Inference technology is unbound by conventional CPUs, GPUs, and NICs,” said NeuReality’s CEO Moshe Tanach. “We didn’t try to just improve an already flawed system. Instead, we unpacked and redefined the ideal AI Inference system from top to bottom and end to end, to deliver breakthrough performance, cost savings, and energy efficiency.”

The key to NeuReality’s solution is a Network Addressable Processing Unit (NAPU), a new architecture design that leverages the power of DLAs. The NeuReality NR1, a network addressable inference Server-on-a-Chip, has an embedded Neural Network Engine and a NAPU.

This new architecture enables inference through hardware with AI-over-Fabric, an AI hypervisor, and AI-pipeline offload.

The company has two products that utilize its Server-on-a-Chip: the NR1-M AI Inference Module and the NR1-S AI Inference Appliance. The former is a Full-Height, Double-wide PCIe card that contains one NR1 NAPU system-on-a-chip and a network-addressable Inference Server that can connect to an external DLA. The latter is an AI-centric inference server containing NR1-M modules with the NR1 NAPU. NeuReality claims the server “lowers cost and power performance by up to 50X but doesn’t require IT to implement for end users.”

“Investing in more and more DLAs, GPUs, LPUs, TPUs… won’t address your core issue of system inefficiency,” said Tanach. “It’s akin to installing a faster engine in your car to navigate through traffic congestion and dead ends – it simply won’t get you to your destination any faster. NeuReality, on the other hand, provides an express lane for large AI pipelines, seamlessly routing tasks to purpose-built AI devices and swiftly delivering responses to your customers, while conserving both resources and capital.”

NeuReality recently secured $20 million in funding from the European Innovation Council (EIC) Fund, Varana Capital, Cleveland Avenue, XT Hi-Tech and OurCrowd.

Groq LPU (Language Processing Unit) performance tested – capable of 500 tokens per second

Post author By miranda cosgrove
Post date February 28, 2024
No Comments on Groq LPU (Language Processing Unit) performance tested – capable of 500 tokens per second

Groq LPU Inference Engine performance tested

A new player has entered the field of artificial intelligence in the form of the Groq LPU (Language Processing Unit). Groq has the remarkable ability to process over 500 tokens per second using the Llama 7B model. The Groq Language Processing Unit (LPU), is powered by a chip that’s been meticulously crafted to perform swift inference tasks. These tasks are crucial for large language models that require a sequential approach, setting the Groq LPU apart from traditional GPUs and CPUs, which are more commonly associated with model training.

The Groq LPU boasts an impressive 230 on-die SRAM per chip and an extraordinary memory bandwidth that reaches up to 8 terabytes per second. This technical prowess addresses two of the most critical challenges in AI processing: compute density and memory bandwidth. As a result, the Groq LPU Groq LPU (Language Processing Unit). Its development team describe it as a “Purpose-built for inference performance and precision, all in a simple, efficient design.”

Groq LPU Performance Analysis

But the Groq API’s strengths don’t stop there. It also shines in real-time speech-to-speech applications. By pairing the Groq with Faster Whisperer for transcription and a local text-to-speech model, the technology has shown promising results in enhancing the fluidity and naturalness of AI interactions. This advancement is particularly exciting for applications that require real-time processing, such as virtual assistants and automated customer service tools.

Here are some other articles you may find of interest on the subject of Language Processing Units and AI :

A key measure of performance in AI processing is token processing speed, and the Groq has proven itself in this area. When compared to other models like ChatGPT and various local models, the Groq API demonstrated its potential to significantly impact how we engage with AI tasks. This was evident in a unique evaluation known as the chain prompting test, where the Groq was tasked with condensing lengthy texts into more concise versions. The test not only showcased the API’s incredible speed but also its ability to handle complex text processing tasks with remarkable efficiency.

It’s essential to understand that the Groq LPU is not designed for model training. Instead, it has carved out its own niche in the inference market, providing a specialized solution for those in need of rapid inference capabilities. This strategic focus allows the Groq LPU to offer something different from Nvidia’s training-focused technology.

The tests conducted with the Groq give us a glimpse into the future of AI processing. With its emphasis on speed and efficiency, the Groq LPU is set to become a vital tool for developers and businesses that are looking to leverage real-time AI tasks. This is especially relevant as the demand for real-time AI solutions continues to grow.

For those who are eager to explore the technical details of the Groq API, the scripts used in the tests are available through a channel membership. This membership also provides access to a community GitHub and Discord, creating an ideal environment for ongoing exploration and discussion among tech enthusiasts.

The Groq represents a significant step forward in the realm of AI processing. Its ability to perform rapid inference with high efficiency makes it an important addition to the ever-evolving landscape of AI technologies. As the need for real-time AI solutions becomes more pressing, the specialized design of the Groq LPU ensures that it will play a key role in meeting these new challenges.

Filed Under: Technology News, Top News

Latest timeswonderful Deals

Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, timeswonderful may earn an affiliate commission. Learn about our Disclosure Policy.

Tags capable, Groq, language, LPU, Performance, processing, tested, tokens, unit