Using Ollama and Claude CLI as your interface, you can work with large cloud AI models like Nemotron 3 Nano, GLM 4.7, and Kimi‑K2 while your own machine stays fast and cool. [Ollama as a gateway and Claude Code CLI](https://carmelyne.com/the-ultimate-guide-move-ollama-model-storage-to-an-external-drive-on-macos-m1-m2-friendly/) You don’t need a GPU‑heavy machine to use large language models. With Ollama acting as a local gateway and the Claude Code CLI as your interface, you can talk to cloud‑hosted open models while your laptop stays cool and responsive. Your computer becomes the control panel. The cloud does the heavy lifting. ![Ollama and Claude CLI](/content/uploads/2026/02/Cursor_and___Base_Claude-over-Ollama_wrapper_%E2%80%A2_Untitled-1_%E2%80%94__zsh_d.png) *** ## What This Ollama and Claude CLI Setup Actually Does Your laptop → Claude CLI → Ollama → Cloud inference The models are **not running on your machine**. They are hosted remotely, and Ollama forwards your requests to them. This means: - No GPU upgrade required - No RAM overload - No laptop overheating - Access to large, capable models anyway *** ### 1. Install Ollama Download and install Ollama from: Verify installation: ```bash ollama --version ``` *** ### 2. Confirm You Can Run Cloud Models Ollama supports cloud‑hosted variants of some open models. Many Ollama cloud models use `:cloud` or `-cloud` style suffixes, but you should always use the exact model name shown in Ollama’s library: ```bash ollama run :cloud ``` Examples: ```bash ollama run glm-4.7:cloud ollama run kimi-k2.5:cloud ollama run nemotron-3-nano:30b-cloud ollama run devstral-small-2:24b-cloud ollama run minimax-m2.1:cloud ollama run gemini-3-pro-preview:latest ollama run gemini-3-flash-preview:latest ollama run gpt-oss:120b-cloud ollama run gpt-oss:20b-cloud ``` If the model starts without downloading gigabytes locally, you’re using cloud inference correctly. *** ### 3. Install Claude Code CLI You’ll use the Claude Code CLI as your main interface for chatting with and coding alongside these models. #### Install via npm ```bash npm install -g @anthropic-ai/claude-code ``` Verify installation: ```bash claude --version ``` You do **not** need to log in with a paid Anthropic key for this setup, since we’ll route requests through Ollama instead. This setup becomes much more powerful when you use the Claude CLI as your daily interface. Install it using the official instructions from Anthropic. Once installed, you’ll be able to point Claude at Ollama instead of Anthropic’s hosted models. *** ### 4. Create a Clean Ollama Wrapper for Claude Instead of repeating environment variables every time, define a helper alias or function in your shell config (`.zshrc` or `.bashrc`). #### Base wrapper ```bash alias claude-ollama='ANTHROPIC_AUTH_TOKEN=ollama \ ANTHROPIC_BASE_URL=http://localhost:11434 \ ANTHROPIC_API_KEY="" \ claude' ``` This tells Claude CLI to: - Send requests to Ollama - Use Ollama as the “auth provider” - Skip real Anthropic API keys *** ### 5. Create Model Shortcuts (Much Cleaner) We use aliases instead of changing Claude’s global configuration so your **default Claude setup stays untouched**. This lets you switch between standard Claude usage and Ollama‑routed cloud models without breaking anything or rewriting config files. Now you can make readable aliases for each cloud model: ```bash alias glm='claude-ollama --model glm-4.7:cloud' alias kimi-k2='claude-ollama --model kimi-k2.5:cloud' alias nemo='claude-ollama --model nemotron-3-nano:30b-cloud' ``` Run them in the terminal: ```bash glm kimi-k2 nemo ``` And Claude CLI will talk to those cloud‑hosted models through Ollama. ![GLM 4.7 on Claude CLI](/content/uploads/2026/02/glm-carmelyne.png) *** ### 6. These Cloud Models Can Use Tools Too Because you’re accessing them through the Claude Code CLI, these cloud‑inferenced models aren’t limited to plain chat. They can use the same developer tools Claude normally can, such as: - Reading and writing files - Running shell commands like `grep` - Performing web searches (when enabled in your CLI setup) So even though the model is running remotely, it can still act locally through Claude’s tool layer. Your laptop becomes the execution environment, while the cloud model provides the intelligence. This is what makes the setup powerful: lightweight hardware, full developer‑agent capabilities. *** ### 7. What You Just Built You now have a hybrid AI setup: - Local machine → lightweight routing - Cloud GPUs → heavy reasoning Your laptop stays fast and quiet while you still get access to large‑scale models. This is especially useful if you: - Use a MacBook Air or older laptop - Work while traveling - Don’t want to maintain a local GPU setup *** ## Free Tier and Paid Limits Ollama offers a Free tier, with paid plans such as Pro at $20/month and Max at $100/month if you need higher limits. You’re essentially renting bursts of supercomputer time only when you need it. *** ## Cloud Model Reference Table
Model Name Parameters Core Focus / Capabilities
glm-5:cloud 744B (40B active) Advanced reasoning, systems engineering
qwen3-vl:cloud 235B+ State-of-the-art vision & thinking
devstral-2:cloud 123B Codebase exploration, multi-file editing
nemotron-3-super:cloud 120B (MoE) NVIDIA; complex multi-agent apps
qwen3.5:cloud 122B (Max) Multimodal, vision, thinking, tools
qwen3-next:cloud 80B High parameter efficiency & speed
nemotron-3-nano:cloud 30B Optimized agentic workflows
devstral-small-2:cloud 24B Specialized software engineering agents
kimi-k2.5:cloud (N/A) Multimodal agentic, instant/thinking modes
minimax-m2.5:cloud (N/A) Productivity, coding, high-efficiency
gemini-3-flash:preview (N/A) Frontier speed & multimodal reasoning
deepseek-v3.2:cloud (N/A) High efficiency, agent performance
*** ### Summary - **Ollama** handles routing. - **Claude CLI** gives you a powerful developer interface. - The **cloud** runs the big models. Your laptop just orchestrates everything. And that’s a very efficient way to work with modern LLMs. *** ## Quick Reference: Can I...? / Do I need...? *A tactical checklist for scaling your inference without scaling your hardware.* - **Can I use Ollama cloud without a GPU?** Yes. The model inference runs in Ollama’s cloud instead of on your laptop. - **Can I use Claude Code with Ollama cloud?** Yes. Ollama supports Claude Code through an Anthropic-compatible API. - **Do I need to pull models locally first?** Usually, yes. Pulling a supported cloud model makes it available to your Ollama install. - **Can I disable cloud later?** Yes. Use `OLLAMA_NO_CLOUD=1` or disable cloud in `server.json` for a local-only setup. - **What does Ollama say about data retention?** Ollama says its cloud does not retain your data. **Summary:** This setup lets a lightweight laptop act as a practical control surface for much larger cloud-hosted models. 🌿✨ **END_OF_REPORT** 🌿✨