Topic

Ollama

3 news articles · 7 tutorials

News

A 6 GB GTX 1660 running Gemma 3 4B vision through Ollama, set against the 753B GLM-5.2 release it cannot host.

GLM-5.2 Ships 753B Open Weights. My GTX 1660 Holds 6 GB.

In mid-June 2026 Z.ai shipped GLM-5.2, a 753-billion-parameter open-weight model that runs roughly even with GPT-5.5 on long-horizon coding. The card on my desk holds 6 GB. After months of running Gemma 3 4B vision on a GTX 1660, this is the real cost of self-hosting open models in VRAM, latency, and quality.

Aditya SharmaJun 28, 20267 MIN

Launches

Ollama Ships Kimi CLI, Guts MLX Sampling

Ollama's v0.21.1 drops Kimi CLI support and tightens MLX performance, but your model picker might still show stale choices on macOS. Here’s what to migrate now.

AutoKaam EditorialApr 28, 20266 MIN

Launches

v0.21.0 Lands: Ollama Ships Hermes, Crowds Out Cloud Copilot

Ollama's v0.21.0 drops Hermes Agent and Copilot CLI integration, but the real win is smoother local inference on Macs and cleaner config management, a quiet upgrade cycle that matters more than the flashy features.

AutoKaam EditorialApr 26, 20266 MIN

Tutorials

Gemma 3 4B vision running locally on a GTX 1660 for screenshot OCR, automation tutorial on AutoKaam

⚡Automationintermediate

I Run Gemma 3 Vision On A 6GB GTX 1660 For Screenshot OCR: The Real VRAM And Latency Numbers

I host Gemma 3 4B vision on a single 6GB GTX 1660 for screenshot OCR and invoice extraction. Here are the install steps, the exact model tag, the VRAM it actually eats, and the cold versus warm latency I measured this week on my own desktop.

Jun 28, 2026·7 min read

Cline panel in VS Code talking to local Ollama

💻AI Codingintermediate

Cline VS Code Extension With Local LLM, Free Agentic Coding

Cline is a VS Code extension that turns any LLM endpoint into a coding agent. I wired it to a local Ollama Qwen 2.5 7B and ran a week of work without a single API call.

May 6, 2026·7 min read

Continue panel in VS Code with model picker

💻AI Codingintermediate

Continue.dev In VS Code And JetBrains, Open Source AI Coding Setup

Continue is the open-source alternative to Cursor and Copilot. I run it in VS Code on Linux and IntelliJ IDEA on the same projects. This is the config that survived the move.

May 6, 2026·7 min read

Ollama running Gemma 4 9B in a terminal on Ubuntu

⚡Automationbeginner

Ollama Gemma 4 On Linux, From Install To First Token

I ran Gemma 4 9B locally on a 32GB Linux box for a week of normal use. This is the Ollama install, the model variants worth pulling, and the one config tweak the docs do not flag.

May 6, 2026·6 min read

Ollama pulling Qwen 2.5 with bandwidth meter visible

⚡Automationbeginner

Ollama Qwen 2.5 7B, India Bandwidth Notes And Setup

Qwen 2.5 7B is my preferred local coding model. The pull is 4.4GB, which on Indian fibre means a 12-20 minute commitment. This is the install plus the bandwidth-saving tricks I use.

May 6, 2026·6 min read

Open WebUI in browser showing chat with local Qwen

⚡Automationintermediate

Open WebUI, Self-Hosted Frontend For Your Local Ollama

Open WebUI is the polished web frontend for Ollama. I run it as the family-friendly AI chat at home. This is the install I use, including the no-Docker path I worked out.

May 6, 2026·7 min read

⚡Automationadvanced

Running Gemma 4 Locally With Ollama, Setup Guide For Indian Devs

Local inference setup for Gemma 4 with Ollama on Mac, Windows, and Linux. Hardware requirements, performance benchmarks, and use cases where local beats cloud.

Apr 4, 2026·8 min read