Using Ollama and Claude CLI as your interface, you can work with large cloud AI models like Nemotron 3 Nano, GLM 4.7, and Kimi‑K2 while your own machine stays fast and cool. Ollama as a gateway and Claude Code CLI

You don’t need a GPU‑heavy machine to use large language models.

With Ollama acting as a local gateway and the Claude Code CLI as your interface, you can talk to cloud‑hosted open models while your laptop stays cool and responsive.

Your computer becomes the control panel. The cloud does the heavy lifting.

Ollama and Claude CLI


What This Ollama and Claude CLI Setup Actually Does

Your laptop → Claude CLI → Ollama → Cloud inference

The models are not running on your machine. They are hosted remotely, and Ollama forwards your requests to them.

This means:

  • No GPU upgrade required
  • No RAM overload
  • No laptop overheating
  • Access to large, capable models anyway

1. Install Ollama

Download and install Ollama from:

https://ollama.com

Verify installation:

ollama --version

2. Confirm You Can Run Cloud Models

Ollama supports cloud‑hosted variants of some open models. The pattern is always:

ollama run <model-name>:cloud

Examples:

ollama run glm-4.7:cloud
ollama run kimi-k2.5:cloud
ollama run nemotron-3-nano:30b-cloud
ollama run devstral-small-2:24b-cloud
ollama run minimax-m2.1:cloud
ollama run gemini-3-pro-preview:latest
ollama run gemini-3-flash-preview:latest
ollama run gpt-oss:120b-cloud
ollama run gpt-oss:20b-cloud

If the model starts without downloading gigabytes locally, you’re using cloud inference correctly.


3. Install Claude Code CLI

You’ll use the Claude Code CLI as your main interface for chatting with and coding alongside these models.

Install via npm

npm install -g @anthropic-ai/claude-code

Verify installation:

claude --version

You do not need to log in with a paid Anthropic key for this setup, since we’ll route requests through Ollama instead.

This setup becomes much more powerful when you use the Claude CLI as your daily interface.

Install it using the official instructions from Anthropic. Once installed, you’ll be able to point Claude at Ollama instead of Anthropic’s hosted models.


4. Create a Clean Ollama Wrapper for Claude

Instead of repeating environment variables every time, define a helper alias or function in your shell config (.zshrc or .bashrc).

Base wrapper

alias claude-ollama='ANTHROPIC_AUTH_TOKEN=ollama \
ANTHROPIC_BASE_URL=http://localhost:11434 \
ANTHROPIC_API_KEY="" \
claude'

This tells Claude CLI to:

  • Send requests to Ollama
  • Use Ollama as the “auth provider”
  • Skip real Anthropic API keys

5. Create Model Shortcuts (Much Cleaner)

We use aliases instead of changing Claude’s global configuration so your default Claude setup stays untouched. This lets you switch between standard Claude usage and Ollama‑routed cloud models without breaking anything or rewriting config files.

Now you can make readable aliases for each cloud model:

alias glm='claude-ollama --model glm-4.7:cloud'
alias kimi-k2='claude-ollama --model kimi-k2.5:cloud'
alias nemo='claude-ollama --model nemotron-3-nano:30b-cloud'

Run them in the terminal:

glm
kimi-k2
nemo

And Claude CLI will talk to those cloud‑hosted models through Ollama.

GLM 4.7 on Claude CLI


6. These Cloud Models Can Use Tools Too

Because you’re accessing them through the Claude Code CLI, these cloud‑inferenced models aren’t limited to plain chat. They can use the same developer tools Claude normally can, such as:

  • Reading and writing files
  • Running shell commands like grep
  • Performing web searches (when enabled in your CLI setup)

So even though the model is running remotely, it can still act locally through Claude’s tool layer. Your laptop becomes the execution environment, while the cloud model provides the intelligence.

This is what makes the setup powerful: lightweight hardware, full developer‑agent capabilities.


7. What You Just Built

You now have a hybrid AI setup:

  • Local machine → lightweight routing
  • Cloud GPUs → heavy reasoning

Your laptop stays fast and quiet while you still get access to large‑scale models.

This is especially useful if you:

  • Use a MacBook Air or older laptop
  • Work while traveling
  • Don’t want to maintain a local GPU setup

Free Tier and Paid Limits

Some cloud models offer free usage tiers. If you hit limits, higher quotas are often around $20 / month — far cheaper than upgrading hardware.

You’re essentially renting bursts of supercomputer time only when you need it.


Summary

  • Ollama handles routing.
  • Claude CLI gives you a powerful developer interface.
  • The cloud runs the big models.

Your laptop just orchestrates everything.

And that’s a very efficient way to work with modern LLMs.

END_OF_REPORT 🌿✨