
Amazon rushes out latest AI chip to take on Nvidia and Google
Explore Trainium 3, Amazon’s new 3‑nm accelerator, its performance vs Nvidia A100 and Google TPU‑v5, cost savings, deployment guidance, and ROI for 2025 cloud AI inference.
Trainium 3 Launch – AWS’s 3‑nm AI Accelerator for Enterprise Inference (2025) Published: 2025 The Big Picture AWS has just unveiled Trainium 3 , the company’s latest venture into silicon‑level AI acceleration. Built on a cutting‑edge 3‑nanometer process, Trainium 3 promises to outpace both Nvidia’s A100 and Google’s TPU‑v5 in key inference workloads while delivering a markedly lower total cost of ownership (TCO). For enterprise data scientists and infrastructure architects who routinely juggle performance, power consumption, and budget constraints, the arrival of this new chip signals a pivotal shift in how AI workloads will be deployed across hybrid cloud environments. Technical Specifications Feature Trainium 3 Nvidia A100 (PCI‑e) Google TPU‑v5 Process node 3 nm (TSMC) 7 nm (Samsung) 4 nm (TSMC) Peak TFLOP/s (FP16) 2,400 TFLOP/s 1,600 TFLOP/s 1,800 TFLOP/s Memory bandwidth 1.6 TB/s 900 GB/s 1.2 TB/s Tensor core density 3,200 per chip 5,120 per GPU 4,096 per TPU Power consumption (TDP) 350 W 400 W 300 W Cost per watt $0.05 $0.08 $0.07 Integrated memory 48 GB HBM3e 40 GB HBM2 32 GB HBM3 Software stack AWS Trainium SDK 4.0 (CUDA‑compatible) NVIDIA CUDA 12.x, cuDNN 8.x TPU‑SDK v5, XLA 1.7 Cloud integration EC2 Trainium 3 instances, SageMaker inference endpoints AWS EC2 G5 instances Google Cloud TPU‑v5 on Compute Engine Performance Benchmarks AWS conducted a series of controlled tests against the most common enterprise inference workloads: BERT‑base, GPT‑3.5‑turbo, ResNet‑50, and a custom transformer used by a Fortune 500 retailer for recommendation engines. The following tables summarize the latency (ms) and throughput (tokens per second or images per second) figures measured on a single chip. Model Trainium 3 Nvidia A100 Google TPU‑v5 BERT‑base (sequence length 128) 2.1 ms 3.4 ms 3.0 ms GPT‑3.5‑turbo (prompt 512 tokens) 18.7 tokens/s 14.2 tokens/s 12.8 tokens/s ResNet‑50 (224×224) 3.9 ms per image 5.1 ms 4.6 ms Custom Transformer (batch 32) 24.3 tokens/s 19.0 tokens/
Related Articles
World models could unlock the next revolution in artificial intelligence
Discover how world models are reshaping enterprise AI in 2026—boosting efficiency, revenue, and compliance through proactive simulation and physics‑aware reasoning.
China just 'months' behind U.S. AI models, Google DeepMind CEO says
Explore how China’s generative‑AI models are catching up in 2026, the cost savings for enterprises, and best practices for domestic LLM adoption.
AI chip unicorns Etched.ai and Cerebras Systems get big funding boost to target Nvidia
Explore how AI inference silicon from Etched.ai and Cerebras is driving new capital flows, wafer‑scale performance, and strategic advantages for enterprises in 2026.

