iPhone 16 Pro Max Neural Engine produces garbage LLM output - 15 Pro runs same code fine

A developer discovered their iPhone 16 Pro Max generates nonsense when running MLX framework LLMs, while an iPhone 15 Pro and MacBook Pro execute identical code perfectly. Tensor outputs show numerical values an order of magnitude wrong, pointing to potential Neural Engine hardware defects in Apple's latest flagship.

TheBiggish Editorial · Monday, February 2, 2026

The Problem

Developer Rafael Costa hit a peculiar wall building a simple expense tracking app: MLX-based LLMs produce gibberish on his iPhone 16 Pro Max, while the same code runs perfectly on an iPhone 15 Pro and MacBook Pro. When asking "What is 2+2?", the 16 Pro Max outputs strings like "Applied.....*_dAK[...]" instead of coherent responses.

The tensor values are wrong by an order of magnitude. That's not a software quirk - that's hardware.

What This Means

Apple's MLX framework is the enterprise path for on-device LLM inference on iOS, leveraging the Neural Engine for performance. Benchmarks show the A18 Pro chip in iPhone 16 should deliver 150+ tokens/second for small models like Qwen 0.5B - impressive numbers that matter for edge AI deployments.

But isolated reports suggest Neural Engine issues in production devices. Costa's experience mirrors a GitHub bug report on mlx-swift-examples noting similar gibberish from MLXChatExample on iPhone 16 Pro hardware.

This isn't the iOS 18 lag complaints about AI features draining battery. Those have software fixes. This is computational accuracy failure at the silicon level.

The Broader Context

For enterprise architects evaluating on-device AI strategies, this matters. MLX vs PyTorch debates for mobile inference assume hardware reliability. Quantization settings (int8 vs float16) and memory optimization become moot if the Neural Engine produces wrong answers.

Apple's expanding APAC presence in enterprise edge ML - financial services apps processing transactions locally, healthcare apps analyzing data on-device - relies on consistent hardware behavior across device generations.

What We're Watching

Whether this is isolated hardware defects in replacement units or a broader A18 Pro issue. Apple hasn't acknowledged Neural Engine problems publicly. The workaround? Use last year's phone.

That's not a deployment strategy.

The Problem

What This Means

The Broader Context

What We're Watching

Related Articles

GitHub repo claims to solve 'winner-take-all entropy' - theory unverified

UploadVR editor fired after opposing AI bot deployment - broader VR industry faces cuts

AI-washing: 59% of firms admit framing cuts as AI-driven for optics, not reality