Amazon rushes out latest AI chip to take on Nvidia and Google
AI Technology

Amazon rushes out latest AI chip to take on Nvidia and Google

December 4, 20252 min readBy Riley Chen

Trainium 3 Launch – AWS’s 3‑nm AI Accelerator for Enterprise Inference (2025) Published: 2025 The Big Picture AWS has just unveiled Trainium 3 , the company’s latest venture into silicon‑level AI acceleration. Built on a cutting‑edge 3‑nanometer process, Trainium 3 promises to outpace both Nvidia’s A100 and Google’s TPU‑v5 in key inference workloads while delivering a markedly lower total cost of ownership (TCO). For enterprise data scientists and infrastructure architects who routinely juggle performance, power consumption, and budget constraints, the arrival of this new chip signals a pivotal shift in how AI workloads will be deployed across hybrid cloud environments. Technical Specifications Feature Trainium 3 Nvidia A100 (PCI‑e) Google TPU‑v5 Process node 3 nm (TSMC) 7 nm (Samsung) 4 nm (TSMC) Peak TFLOP/s (FP16) 2,400 TFLOP/s 1,600 TFLOP/s 1,800 TFLOP/s Memory bandwidth 1.6 TB/s 900 GB/s 1.2 TB/s Tensor core density 3,200 per chip 5,120 per GPU 4,096 per TPU Power consumption (TDP) 350 W 400 W 300 W Cost per watt $0.05 $0.08 $0.07 Integrated memory 48 GB HBM3e 40 GB HBM2 32 GB HBM3 Software stack AWS Trainium SDK 4.0 (CUDA‑compatible) NVIDIA CUDA 12.x, cuDNN 8.x TPU‑SDK v5, XLA 1.7 Cloud integration EC2 Trainium 3 instances, SageMaker inference endpoints AWS EC2 G5 instances Google Cloud TPU‑v5 on Compute Engine Performance Benchmarks AWS conducted a series of controlled tests against the most common enterprise inference workloads: BERT‑base, GPT‑3.5‑turbo, ResNet‑50, and a custom transformer used by a Fortune 500 retailer for recommendation engines. The following tables summarize the latency (ms) and throughput (tokens per second or images per second) figures measured on a single chip. Model Trainium 3 Nvidia A100 Google TPU‑v5 BERT‑base (sequence length 128) 2.1 ms 3.4 ms 3.0 ms GPT‑3.5‑turbo (prompt 512 tokens) 18.7 tokens/s 14.2 tokens/s 12.8 tokens/s ResNet‑50 (224×224) 3.9 ms per image 5.1 ms 4.6 ms Custom Transformer (batch 32) 24.3 tokens/s 19.0 tokens/

#investment#Google AI
Share this article

Related Articles

World models could unlock the next revolution in artificial intelligence

Discover how world models are reshaping enterprise AI in 2026—boosting efficiency, revenue, and compliance through proactive simulation and physics‑aware reasoning.

Jan 187 min read

China just 'months' behind U.S. AI models, Google DeepMind CEO says

Explore how China’s generative‑AI models are catching up in 2026, the cost savings for enterprises, and best practices for domestic LLM adoption.

Jan 172 min read

AI chip unicorns Etched.ai and Cerebras Systems get big funding boost to target Nvidia

Explore how AI inference silicon from Etched.ai and Cerebras is driving new capital flows, wafer‑scale performance, and strategic advantages for enterprises in 2026.

Jan 152 min read