If you are using the new ollama launch codex flow to run remote inference on your local terminal, you might have run into a familiar friction point: The Metadata Warning.
When you run:
ollama launch codex --model nemotron-3-super:cloud
Codex often greets you with:
Model metadata for `nemotron-3-super:cloud` not found. Defaulting to fallback
metadata; this can degrade performance and cause issues.
This happens because Codex expects a static profile in your ~/.codex/config.toml for every model it interacts with. But in a world where cloud model slugs like nemotron-3, qwen-3.5, and glm-5 are moving at lightspeed, maintaining static profiles is a losing game.
I wanted a “Clean Path”—a way to treat the launcher input as dynamic metadata so I never have to touch a config file again.
The Problem: Static vs. Ephemeral
The traditional way to fix this is to hand-craft a profile:
# ~/.codex/config.toml
[[profiles]]
name = "nemotron-3-super:cloud"
model_context_window = 131072
auto_compact_token_limit = 80000
But as soon as you switch to qwen-3.5:397b-cloud, you’re back at the drawing board. The ownership boundary is the launcher handoff, not the static config.
The Solution: The “Launcher-as-Profile” Pattern
Instead of fighting the configuration, I built a lightweight global wrapper for codex. When Ollama launches a session, the wrapper intercepts the call, fetches the metadata from the local Ollama API, and synthesizes an ephemeral profile on the fly.
How it works under the hood
The wrapper looks for signals that it’s being launched in an Ollama-backed session (like OPENAI_API_KEY=ollama). It then:
- Extracts the Model Slug: It pulls the exact
--modelstring passed by Ollama. - Queries the Local Gateway: It hits
http://127.0.0.1:11434/api/showto see what Ollama knows about that model. - Applies Intelligent Heuristics: If cloud metadata is sparse, it uses family-aware logic:
nemotron*models get a 128k context window.qwen3*models get a 256k context window.- Generic cloud models default to a safe 64k or 128k.
- Injects Runtime Config: It uses Codex’s
-cflag to pass these values directly to the binary, bypassing the need for a persistent file.
# What the wrapper actually executes:
codex-openai --oss -m nemotron-3-super:cloud \
-c model_context_window=131072 \
-c auto_compact_token_limit=78643 \
-c tool_output_token_limit=12000
Why this is the “Senior” Move
As a developer, your terminal should be a Tactical Command Deck, not a maintenance chore. By moving the logic into a wrapper, we achieve:
- Zero Config Churn: No more
config.tomlsprawl. - Infrastructure Sovereignty: You control how context windows are calculated per family.
- Future Proofing: If Ollama starts exposing richer metadata, we update the wrapper in one place, and every model benefits immediately.
Setting it up
If you’re already running my ollama-codex-wrapper.mjs, you’ve probably noticed your logs look much cleaner. For those who want to implement this:
- Move your original
codexbinary tocodex-openai. - Place the wrapper script at
/usr/local/bin/codex. - Ensure your wrapper caches generated catalogs in
~/.cache/ollama-launch/for performance.
Testing with Nemotron 3
Once installed, the launch is seamless. You can even verify the dynamic overrides with a debug flag:
CODEX_OLLAMA_WRAPPER_DEBUG=1 ollama launch codex --model nemotron-3-super:cloud
You’ll see the wrapper instantly detecting the Nemotron family and assigning the correct 128k context window—no warnings, no degradation, just pure agentic speed.
🧠 Tactical FAQ: Dynamic Wrappers
Q: Does this affect my OpenAI/Anthropic profiles?
No. The wrapper only activates when it detects an ollama provider signal. Your standard API profiles remain untouched.
Q: Can I still override the context window manually?
Yes! I built in environment overrides: CODEX_OLLAMA_CONTEXT_WINDOW=262144 will always take precedence over the heuristics.
Q: Why not just use ollama launch claude?
Claude Code is fantastic, but Codex remains a powerhouse for specific refactoring and developer-grade code generation workflows. Having both in your “Agentic Stack” is the ultimate flex.
The goal isn’t just to use AI; it’s to build the infrastructure that makes using AI feel like magic. Still us. 💚
Explore the Framework
These concepts are part of a broader framework for building intent-aware AI systems. I've distilled these strategies into a short, practical guide called Thinking Modes.
View the Book →