If you are using the new `ollama launch codex` flow to run remote inference on your local terminal, you might have run into a familiar friction point: **The Metadata Warning.** When you run: ```bash ollama launch codex --model nemotron-3-super:cloud ``` Codex often greets you with: ```text Model metadata for `nemotron-3-super:cloud` not found. Defaulting to fallback metadata; this can degrade performance and cause issues. ``` This happens because Codex expects a static profile in your `~/.codex/config.toml` for every model it interacts with. But in a world where cloud model slugs like `nemotron-3`, `qwen-3.5`, and `glm-5` are moving at lightspeed, maintaining static profiles is a losing game. I wanted a "Clean Path"—a way to treat the launcher input as dynamic metadata so I never have to touch a config file again. ## The Problem: Static vs. Ephemeral The traditional way to fix this is to hand-craft a profile: ```toml # ~/.codex/config.toml [[profiles]] name = "nemotron-3-super:cloud" model_context_window = 131072 auto_compact_token_limit = 80000 ``` But as soon as you switch to `qwen-3.5:397b-cloud`, you're back at the drawing board. The ownership boundary is the **launcher handoff**, not the static config. ## The Solution: The "Launcher-as-Profile" Pattern Instead of fighting the configuration, I built a lightweight global wrapper for `codex`. When Ollama launches a session, the wrapper intercepts the call, fetches the metadata from the local Ollama API, and synthesizes an ephemeral profile on the fly. ### How it works under the hood The wrapper looks for signals that it's being launched in an Ollama-backed session (like `OPENAI_API_KEY=ollama`). It then: 1. **Extracts the Model Slug:** It pulls the exact `--model` string passed by Ollama. 2. **Queries the Local Gateway:** It hits `http://127.0.0.1:11434/api/show` to see what Ollama knows about that model. 3. **Applies Intelligent Heuristics:** If cloud metadata is sparse, it uses family-aware logic: * `nemotron*` models get a **128k** context window. * `qwen3*` models get a **256k** context window. * Generic cloud models default to a safe **64k** or **128k**. 4. **Injects Runtime Config:** It uses Codex's `-c` flag to pass these values directly to the binary, bypassing the need for a persistent file. ```bash # What the wrapper actually executes: codex-openai --oss -m nemotron-3-super:cloud \ -c model_context_window=131072 \ -c auto_compact_token_limit=78643 \ -c tool_output_token_limit=12000 ``` ## Why this is the "Senior" Move As a developer, your terminal should be a **Tactical Command Deck**, not a maintenance chore. By moving the logic into a wrapper, we achieve: * **Zero Config Churn:** No more `config.toml` sprawl. * **Infrastructure Sovereignty:** You control how context windows are calculated per family. * **Future Proofing:** If Ollama starts exposing richer metadata, we update the wrapper in one place, and every model benefits immediately. ## Setting it up If you're already running my `ollama-codex-wrapper.mjs`, you've probably noticed your logs look much cleaner. For those who want to implement this: 1. Move your original `codex` binary to `codex-openai`. 2. Place the wrapper script at `/usr/local/bin/codex`. 3. Ensure your wrapper caches generated catalogs in `~/.cache/ollama-launch/` for performance. ## Testing with Nemotron 3 Once installed, the launch is seamless. You can even verify the dynamic overrides with a debug flag: ```bash CODEX_OLLAMA_WRAPPER_DEBUG=1 ollama launch codex --model nemotron-3-super:cloud ``` You'll see the wrapper instantly detecting the Nemotron family and assigning the correct 128k context window—no warnings, no degradation, just pure agentic speed. *** ## Frequently Asked Questions ### Does this affect my OpenAI/Anthropic profiles? No. The wrapper only activates when it detects an `ollama` provider signal. Your standard API profiles remain untouched. ### Can I still override the context window manually? Yes! I built in environment overrides: `CODEX_OLLAMA_CONTEXT_WINDOW=262144` will always take precedence over the heuristics. ### Why not just use `ollama launch claude`? Claude Code is fantastic, but Codex remains a powerhouse for specific refactoring and developer-grade code generation workflows. Having both in your ### What is the benefit of using Ollama for local models? Using [Ollama](https://ollama.com) allows you to run massive models like [NVIDIA Nemotron](https://nvidia.com) or [Qwen3](https://huggingface.co/Qwen) without needing to manage complex configurations. It bridges the gap between local hardware and cloud intelligence. ### Is this compatible with other CLI tools? Yes. The same *** *The goal isn't just to use AI; it's to build the infrastructure that makes using AI feel like magic. 💚*