
Ollama Ships Kimi CLI, Guts MLX Sampling
Ollama's v0.21.1 drops Kimi CLI support and tightens MLX performance, but your model picker might still show stale choices on macOS. Here’s what to migrate now.
Topic
2 news articles · 6 tutorials

Ollama's v0.21.1 drops Kimi CLI support and tightens MLX performance, but your model picker might still show stale choices on macOS. Here’s what to migrate now.

Ollama's v0.21.0 drops Hermes Agent and Copilot CLI integration, but the real win is smoother local inference on Macs and cleaner config management, a quiet upgrade cycle that matters more than the flashy features.

Cline is a VS Code extension that turns any LLM endpoint into a coding agent. I wired it to a local Ollama Qwen 2.5 7B and ran a week of work without a single API call.

Continue is the open-source alternative to Cursor and Copilot. I run it in VS Code on Linux and IntelliJ IDEA on the same projects. This is the config that survived the move.

I ran Gemma 4 9B locally on a 32GB Linux box for a week of normal use. This is the Ollama install, the model variants worth pulling, and the one config tweak the docs do not flag.

Qwen 2.5 7B is my preferred local coding model. The pull is 4.4GB, which on Indian fibre means a 12-20 minute commitment. This is the install plus the bandwidth-saving tricks I use.

Open WebUI is the polished web frontend for Ollama. I run it as the family-friendly AI chat at home. This is the install I use, including the no-Docker path I worked out.

Local inference setup for Gemma 4 with Ollama on Mac, Windows, and Linux. Hardware requirements, performance benchmarks, and use cases where local beats cloud.