OpenAI Codex provider cooldown: all models failed (no available auth profile)
Summary
In some sessions, OpenClaw fails before replying when every configured model in the fallback chain depends on openai-codex, and openai-codex profiles are unavailable (cooldown/rate-limit state).
Observed codex-only variants include:
- pure cooldown/unavailable profile errors
- first-model timeout + second-model cooldown/unavailable profile errors
This behaves like a provider/profile availability failure, not a content failure.
Environment
- OpenClaw version: not captured in the user report
- OS: not captured in the user report
- Channel: reported in direct chat workflow
- Models in failed chain:
openai-codex/gpt-5.3-codex
openai-codex/gpt-5.3-codex-spark
Reproduction
- Configure primary + fallback models under the same provider (
openai-codex).
- Trigger provider rate limiting or profile cooldown state (high request burst can cause this).
- Send a normal user prompt.
Expected vs actual
- Expected:
- A model in the chain responds, or fallback moves to another available provider.
- Actual:
- Agent fails before reply with codex provider/profile unavailability errors.
Exact reported errors (redacted variants):
Variant A
⚠️ Agent failed before reply: All models failed (2):
openai-codex/gpt-5.3-codex: No available auth profile for openai-codex (all in cooldown or unavailable). (rate_limit) |
openai-codex/gpt-5.3-codex-spark: Provider openai-codex is in cooldown (all profiles unavailable) (rate_limit).
Logs: openclaw logs --follow
Variant B
⚠️ Agent failed before reply: All models failed (2):
openai-codex/gpt-5.3-codex: LLM request timed out. (unknown) |
openai-codex/gpt-5.3-codex-spark: No available auth profile for openai-codex (all in cooldown or unavailable). (rate_limit).
Logs: openclaw logs --follow
Findings
- Core pattern: codex-only fallback can fully fail when profile availability collapses (cooldown/rate-limit).
- A first-model timeout can appear before/alongside cooldown errors on downstream candidates.
- Fallback is ineffective if all candidates share the same provider and that provider is unhealthy.
- This is likely a resilience/config gap (fallback diversity), not a prompt-specific content issue.
Mitigation / Workaround
- Add cross-provider fallback (for example one non-
openai-codex model).
- Reduce bursty traffic and add spacing between heavy runs.
- Add/verify additional auth profiles for the provider if available.
- Retry after cooldown window.
- Capture supporting logs during failure:
openclaw logs --follow
openclaw status --all
Risk / Impact
- Reliability: user gets no reply.
- Operations: repeated outages if traffic spikes continue.
- UX: appears as random total failure even with fallback configured.
- Related local Lighthouse note (broader multi-provider overlap):
- No upstream issue link added yet for this exact codex-only symptom in this note.
Next actions
References