Block's Goose agent plus Qwen3-coder: testing the free Claude Code alternative
Block's open-source Goose agent, paired with Alibaba's Qwen3-coder model running on Ollama, is positioning itself as a viable alternative to Claude Code. The appeal is straightforward: no cloud dependencies, no subscription costs, runs entirely on local hardware.
What's actually happening here
Goose functions as an autonomous developer agent - planning tasks, executing code operations, and iterating through an agent loop. The critical technical requirement is tool calling: the model's ability to execute file operations and commands. Qwen3-coder supports this even at 8B parameters with 4-bit quantization, a notably compact configuration.
The architecture matters. Qwen3-coder-Flash uses 32 query attention heads and 4 key-value attention heads per layer, optimized for speed. Benchmark performance puts it competitive with larger models - it scored 31.3 on terminal benchmarks, comparable to models several times its size.
The fine print
This isn't turnkey. GitHub issue #6883 documents that Qwen3-coder fails tool execution when using Goose's full default extension set (11 tools). The stack works in controlled scenarios but shows brittleness at the edges.
Setup requires technical competence: installing Ollama, configuring the model, pointing Goose at the right endpoints. For developers comfortable with CLI tools, this is straightforward. For teams expecting GitHub Copilot's plug-and-play experience, it's friction.
Why this matters for APAC enterprise
Three trends converging: Chinese AI models gaining developer adoption in competitive markets, open-source infrastructure eliminating recurring costs, and growing enterprise interest in locally-deployed coding agents for data sovereignty.
The technical viability of running comparable AI coding capabilities locally changes the economics of coding assistant procurement. CTOs evaluating vendor lock-in against total cost of ownership now have a third option: build it yourself with open components.
What we're watching
Whether this stack handles multi-repository enterprise codebases. Whether tool calling limitations scale with complexity. Whether the gap between "works in demos" and "ships production code" closes or widens.
The vendor says it works. We'll see.