Groq’s architecture is equivalent to one quadrillion operations per second, or 1e15 ops/s and capable of up to 250 trillion floating-point operations per second (FLOPS).
“Top GPU companies have been telling customers that they’d hoped to be able to deliver one PetaOp/s performance within the next few years; Groq is announcing it,” says Groq CEO Jonathan Ross, “the Groq architecture is many multiples faster than anything else available for inference, in terms of both low latency and inferences per second. We had first silicon back, first-day power-on, programs running in the first week, sampled to partners and customers in under six weeks, with A0 silicon going into production”
With a software-first mindset, Groq’s TSP architecture claims to achieve both compute flexibility and massive parallelism without the synchronization overhead of traditional GPU and CPU architectures.
Groq’s architecture can support both traditional and new machine learning models, and is currently in operation on customer sites in both x86 and non-x86 systems.
The architecture is designed specifically for the performance requirements of computer vision, machine learning and other AI-related workloads.
Execution planning happens in software, freeing up silicon real estate otherwise dedicated to dynamic instruction execution.
The tight control provided by this architecture provides deterministic processing that is especially valuable for applications where safety and accuracy are paramount.
Compared to complex traditional architectures based on CPUs, GPUs and FPGAs, Groq’s chip also streamlines qualification and deployment, enabling customers to simply and quickly implement scalable, high performance-per-watt systems.