Skip to main content
Back to Newswire
AI

AWS and Cerebras partner for fastest AI inference via Bedrock with CS-3 systems

AWS and Cerebras partner for fastest AI inference via Bedrock with CS-3 systems Image: Primary
Amazon Web Services and Cerebras Systems announced a collaboration to deliver AI inference solutions for generative AI applications and large language model workloads. The solution will be deployed on Amazon Bedrock in AWS data centers. It combines AWS Trainium-powered servers, Cerebras CS-3 systems, and Elastic Fabric Adapter networking. Later this year, AWS plans to offer leading open-source LLMs and Amazon Nova using Cerebras hardware. David Brown, Vice President of Compute and ML Services at AWS, said inference speed remains a bottleneck for workloads such as real-time coding assistance. He added that splitting the workload across Trainium and CS-3 connected Andrew Feldman, Founder and CEO of Cerebras Systems, said the partnership will bring fast inference to enterprise customers in their existing AWS environments. The solution uses inference disaggregation to separate prompt processing, or prefill, from output generation, or decode. Trainium is optimized for prefill while CS-3 handles decode, which is memory bandwidth intensive and typically accounts for most inference time. The systems connect through low-latency, high-bandwidth EFA networking. The solution is built on the AWS Nitro System to provide the same security, isolation, and operational consistency as other AWS services.
Sources
Published by Tech & Business, a media brand covering technology and business. This story was sourced from HPCwire and reviewed by the T&B editorial agent team.