Browse Source

fix(fallback): implement runtime fallback chains for all foreground agents (#199)

* fix(fallback): implement runtime fallback chains for all foreground agents

Fixes #180 and #191.

Previously, `fallback.chains` was only consumed by BackgroundTaskManager
for background tasks. Foreground (interactive) agent sessions had no runtime
failover: the config hook resolved model arrays to a single string at plugin
init and never consulted `fallback.chains` again.

This commit fixes both the startup-time selection gap and adds true runtime
fallback via the OpenCode event system.

**Startup-time selection (config hook)**
The `config` hook now merges `fallback.chains` entries into the effective
model array before selecting which provider/model to start with. If the
primary model's provider is not configured, the resolver now considers the
full chain instead of only `_modelArray` entries.

**Runtime fallback — new ForegroundFallbackManager**
A new `ForegroundFallbackManager` (src/hooks/foreground-fallback/index.ts)
listens for rate-limit signals on foreground sessions via three events:
- `session.error` — session-level rate-limit error
- `message.updated` — per-message error with rate-limit in error payload;
  also tracks `sessionID → currentModel` and `sessionID → agentName`
- `session.status` type=retry — OpenCode's built-in retry-loop messages

On detection it:
1. Looks up the next untried model in the agent's configured chain
2. Calls `client.session.abort()` to stop the rate-limited prompt
3. Waits 500 ms for the server to settle (mirrors BackgroundTaskManager)
4. Re-queues the original user message via `client.session.promptAsync()`
   with the fallback model — returns immediately, does not block the handler

Each agent (orchestrator, oracle, designer, explorer, librarian, fixer) has
its own chain built from `_modelArray` entries merged with `fallback.chains`
config. Chain resolution priority: known agent name → infer from current
model → flatten all chains as last resort.

**balanceProviderUsage**
Remains unimplemented (schema-only). A follow-up is needed to define its
exact semantics before implementing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(fallback): address Greptile review — memory leak, stale comment, test coverage

- Add session.deleted case to ForegroundFallbackManager.handleEvent to
  clean up all five per-session maps (sessionModel, sessionAgent,
  sessionTried, inProgress, lastTrigger) and prevent unbounded memory
  growth in long-running instances with many subagent sessions

- Fix stale comment in src/index.ts that incorrectly stated runtime
  failover for foreground agents is unsupported; updated to accurately
  describe that it is handled by ForegroundFallbackManager

- Replace misleading "chain exhaustion" test (which only verified the
  dedup window) with two tests that hit the actual exhaustion code path:
  chain.find(m => !tried.has(m)) === undefined

- Add two session.deleted tests: one verifying state cleanup enables a
  fresh fallback after deletion, one verifying no-op on missing sessionID

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(fallback): fix resolveChain cross-agent contamination and session.deleted dual-shape

Two P1 logic bugs identified in second Greptile review:

1. resolveChain cross-agent contamination
   When a session's agent name is known but that agent has no configured
   fallback chain, resolveChain was falling through to the "infer from
   current model" and "last resort flatten all chains" paths. This could
   re-prompt an oracle or explorer session with a model from orchestrator's
   chain — a correctness regression vs the previous no-fallback behavior.

   Fix: when agent name IS known, return that agent's chain or [] immediately.
   The cross-agent fallback paths are now only reachable when the agent name
   is genuinely unknown (e.g. session.error arrives before any message.updated
   has established the agent name for that session).

2. session.deleted dual event shape
   OpenCode emits session.deleted in two shapes depending on context:
     { properties: { sessionID } }     — subagent / background task sessions
     { properties: { info: { id } } }  — top-level session deletion
   The previous cleanup only read properties.sessionID, leaving state
   accumulated for top-level sessions (the info.id shape).

   Fix: mirror the same dual-shape lookup used by BackgroundTaskManager
   (properties?.info?.id ?? properties?.sessionID).

Adds 3 new tests:
- session.deleted with info.id shape clears state and resets dedup
- known agent with no configured chain does not bleed into other agents
- unknown agent correctly uses last-resort cross-agent chain

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: vinz-bambico <vinz@jmancurly.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
John Michael Vincent Bambico 3 weeks ago
parent
commit
231531ae6c

+ 142 - 0
src/config/model-resolution.test.ts

@@ -119,3 +119,145 @@ describe('model array resolution', () => {
     expect(result).toBeNull();
     expect(result).toBeNull();
   });
   });
 });
 });
+
+/**
+ * Tests for the fallback.chains merging logic that runs in the config hook.
+ * Mirrors the effectiveArrays construction in src/index.ts.
+ */
+describe('fallback.chains merging for foreground agents', () => {
+  /**
+   * Simulates the effectiveArrays construction + resolution from src/index.ts.
+   * Returns the resolved model string or null.
+   */
+  function resolveWithChains(opts: {
+    modelArray?: Array<{ id: string; variant?: string }>;
+    currentModel?: string;
+    chainModels?: string[];
+    providerConfig?: Record<string, unknown>;
+    fallbackEnabled?: boolean;
+  }): string | null {
+    const {
+      modelArray,
+      currentModel,
+      chainModels,
+      providerConfig,
+      fallbackEnabled = true,
+    } = opts;
+
+    // Build effectiveArrays (mirrors index.ts logic)
+    const effectiveArray: Array<{ id: string; variant?: string }> = modelArray
+      ? [...modelArray]
+      : [];
+
+    if (fallbackEnabled && chainModels && chainModels.length > 0) {
+      if (effectiveArray.length === 0 && currentModel) {
+        effectiveArray.push({ id: currentModel });
+      }
+      const seen = new Set(effectiveArray.map((m) => m.id));
+      for (const chainModel of chainModels) {
+        if (!seen.has(chainModel)) {
+          seen.add(chainModel);
+          effectiveArray.push({ id: chainModel });
+        }
+      }
+    }
+
+    if (effectiveArray.length === 0) return null;
+
+    const hasProviderConfig =
+      providerConfig && Object.keys(providerConfig).length > 0;
+
+    if (hasProviderConfig) {
+      const configuredProviders = Object.keys(providerConfig);
+      for (const modelEntry of effectiveArray) {
+        const slashIdx = modelEntry.id.indexOf('/');
+        if (slashIdx === -1) continue;
+        const providerID = modelEntry.id.slice(0, slashIdx);
+        if (configuredProviders.includes(providerID)) {
+          return modelEntry.id;
+        }
+      }
+    }
+
+    return effectiveArray[0].id;
+  }
+
+  test('fallback.chains used when agent has a string model and primary provider is not configured', () => {
+    const result = resolveWithChains({
+      currentModel: 'anthropic/claude-opus-4-5',
+      chainModels: ['openai/gpt-4o', 'google/gemini-pro'],
+      providerConfig: { openai: {} }, // only openai configured
+    });
+    expect(result).toBe('openai/gpt-4o');
+  });
+
+  test('primary model wins when its provider IS configured', () => {
+    const result = resolveWithChains({
+      currentModel: 'anthropic/claude-opus-4-5',
+      chainModels: ['openai/gpt-4o'],
+      providerConfig: { anthropic: {}, openai: {} },
+    });
+    expect(result).toBe('anthropic/claude-opus-4-5');
+  });
+
+  test('falls through full chain to find a configured provider', () => {
+    const result = resolveWithChains({
+      currentModel: 'anthropic/claude-opus-4-5',
+      chainModels: ['openai/gpt-4o', 'google/gemini-2.5-pro'],
+      providerConfig: { google: {} }, // only google configured
+    });
+    expect(result).toBe('google/gemini-2.5-pro');
+  });
+
+  test('falls back to primary (first) when no chain provider is configured', () => {
+    const result = resolveWithChains({
+      currentModel: 'anthropic/claude-opus-4-5',
+      chainModels: ['openai/gpt-4o'],
+      providerConfig: {}, // nothing configured
+    });
+    expect(result).toBe('anthropic/claude-opus-4-5');
+  });
+
+  test('chain is ignored when fallback disabled', () => {
+    const result = resolveWithChains({
+      currentModel: 'anthropic/claude-opus-4-5',
+      chainModels: ['openai/gpt-4o'],
+      providerConfig: { openai: {} },
+      fallbackEnabled: false,
+    });
+    // chain not applied; no effectiveArray entry → falls through to null (no _modelArray either)
+    expect(result).toBeNull();
+  });
+
+  test('_modelArray entries take precedence and chain appends after', () => {
+    const result = resolveWithChains({
+      modelArray: [
+        { id: 'anthropic/claude-opus-4-5' },
+        { id: 'anthropic/claude-sonnet-4-5' },
+      ],
+      chainModels: ['openai/gpt-4o'],
+      providerConfig: { openai: {} }, // only openai configured
+    });
+    // anthropic entries in array are skipped; openai/gpt-4o from chain is picked
+    expect(result).toBe('openai/gpt-4o');
+  });
+
+  test('duplicate model ids across array and chain are deduplicated', () => {
+    // openai/gpt-4o appears in both _modelArray and chains — should not duplicate
+    const result = resolveWithChains({
+      modelArray: [{ id: 'anthropic/claude-opus-4-5' }, { id: 'openai/gpt-4o' }],
+      chainModels: ['openai/gpt-4o', 'google/gemini-pro'],
+      providerConfig: { openai: {} },
+    });
+    expect(result).toBe('openai/gpt-4o');
+  });
+
+  test('no currentModel and no _modelArray with chain still resolves', () => {
+    // Edge case: agent has no model set yet, chain provides candidates
+    const result = resolveWithChains({
+      chainModels: ['openai/gpt-4o', 'anthropic/claude-sonnet-4-5'],
+      providerConfig: { anthropic: {} },
+    });
+    expect(result).toBe('anthropic/claude-sonnet-4-5');
+  });
+});

+ 624 - 0
src/hooks/foreground-fallback/index.test.ts

@@ -0,0 +1,624 @@
+import { describe, expect, mock, test, beforeEach } from 'bun:test';
+import { ForegroundFallbackManager, isRateLimitError } from './index';
+
+// ---------------------------------------------------------------------------
+// Helpers
+// ---------------------------------------------------------------------------
+
+function createMockClient(overrides?: {
+  promptAsyncImpl?: (args: unknown) => Promise<unknown>;
+  messagesData?: Array<{ info: { role: string }; parts: unknown[] }>;
+}) {
+  const promptAsync = mock(async (args: unknown) => {
+    if (overrides?.promptAsyncImpl) return overrides.promptAsyncImpl(args);
+    return {};
+  });
+  const abort = mock(async () => ({}));
+  const messages = mock(async () => ({
+    data: overrides?.messagesData ?? [
+      { info: { role: 'user' }, parts: [{ type: 'text', text: 'hello' }] },
+    ],
+  }));
+
+  return {
+    client: {
+      session: {
+        abort,
+        messages,
+        // promptAsync is cast at runtime — expose via the session object
+        promptAsync,
+      },
+    } as unknown as Parameters<typeof ForegroundFallbackManager>[0],
+    mocks: { promptAsync, abort, messages },
+  };
+}
+
+function makeChains(
+  overrides?: Record<string, string[]>,
+): Record<string, string[]> {
+  return {
+    orchestrator: [
+      'anthropic/claude-opus-4-5',
+      'openai/gpt-4o',
+      'google/gemini-2.5-pro',
+    ],
+    explorer: ['openai/gpt-4o-mini', 'anthropic/claude-haiku'],
+    ...overrides,
+  };
+}
+
+// ---------------------------------------------------------------------------
+// isRateLimitError
+// ---------------------------------------------------------------------------
+
+describe('isRateLimitError', () => {
+  test('returns true for 429 status code', () => {
+    expect(isRateLimitError({ data: { statusCode: 429 } })).toBe(true);
+  });
+
+  test('returns true for "rate limit" in message', () => {
+    expect(isRateLimitError({ message: 'Rate limit exceeded' })).toBe(true);
+  });
+
+  test('returns true for "quota exceeded" in responseBody', () => {
+    expect(
+      isRateLimitError({ data: { responseBody: 'quota exceeded' } }),
+    ).toBe(true);
+  });
+
+  test('returns true for "usage exceeded"', () => {
+    expect(isRateLimitError({ message: 'usage exceeded' })).toBe(true);
+  });
+
+  test('returns true for "overloaded"', () => {
+    expect(isRateLimitError({ message: 'overloaded_error' })).toBe(true);
+  });
+
+  test('returns false for non-rate-limit error', () => {
+    expect(isRateLimitError({ message: 'invalid API key' })).toBe(false);
+  });
+
+  test('returns false for null', () => {
+    expect(isRateLimitError(null)).toBe(false);
+  });
+
+  test('returns false for non-object', () => {
+    expect(isRateLimitError('string error')).toBe(false);
+  });
+});
+
+// ---------------------------------------------------------------------------
+// ForegroundFallbackManager — disabled
+// ---------------------------------------------------------------------------
+
+describe('ForegroundFallbackManager (disabled)', () => {
+  test('does nothing when enabled=false', async () => {
+    const { client, mocks } = createMockClient();
+    const mgr = new ForegroundFallbackManager(client, makeChains(), false);
+
+    await mgr.handleEvent({
+      type: 'session.error',
+      properties: {
+        sessionID: 'sess-1',
+        error: { message: 'rate limit exceeded' },
+      },
+    });
+
+    expect(mocks.promptAsync).not.toHaveBeenCalled();
+  });
+});
+
+// ---------------------------------------------------------------------------
+// ForegroundFallbackManager — session.error
+// ---------------------------------------------------------------------------
+
+describe('ForegroundFallbackManager session.error', () => {
+  let client: ReturnType<typeof createMockClient>['client'];
+  let mocks: ReturnType<typeof createMockClient>['mocks'];
+  let mgr: ForegroundFallbackManager;
+
+  beforeEach(() => {
+    ({ client, mocks } = createMockClient());
+    mgr = new ForegroundFallbackManager(client, makeChains(), true);
+  });
+
+  test('triggers fallback on rate-limit session.error', async () => {
+    // First teach the manager which model is in use for this session
+    await mgr.handleEvent({
+      type: 'message.updated',
+      properties: {
+        info: {
+          sessionID: 'sess-1',
+          providerID: 'anthropic',
+          modelID: 'claude-opus-4-5',
+          role: 'assistant',
+        },
+      },
+    });
+
+    await mgr.handleEvent({
+      type: 'session.error',
+      properties: {
+        sessionID: 'sess-1',
+        error: { message: 'Rate limit exceeded' },
+      },
+    });
+
+    expect(mocks.abort).toHaveBeenCalledTimes(1);
+    expect(mocks.promptAsync).toHaveBeenCalledTimes(1);
+
+    const call = mocks.promptAsync.mock.calls[0] as [
+      {
+        path: { id: string };
+        body: { model: { providerID: string; modelID: string } };
+      },
+    ];
+    expect(call[0].path.id).toBe('sess-1');
+    // Should have picked the next model after anthropic/claude-opus-4-5
+    expect(call[0].body.model.providerID).toBe('openai');
+    expect(call[0].body.model.modelID).toBe('gpt-4o');
+  });
+
+  test('does nothing when error is not a rate limit', async () => {
+    await mgr.handleEvent({
+      type: 'session.error',
+      properties: {
+        sessionID: 'sess-1',
+        error: { message: 'invalid request' },
+      },
+    });
+
+    expect(mocks.promptAsync).not.toHaveBeenCalled();
+  });
+
+  test('does nothing when no chain configured for session', async () => {
+    const emptyMgr = new ForegroundFallbackManager(client, {}, true);
+    await emptyMgr.handleEvent({
+      type: 'session.error',
+      properties: {
+        sessionID: 'sess-1',
+        error: { message: 'rate limit exceeded' },
+      },
+    });
+
+    expect(mocks.promptAsync).not.toHaveBeenCalled();
+  });
+});
+
+// ---------------------------------------------------------------------------
+// ForegroundFallbackManager — message.updated
+// ---------------------------------------------------------------------------
+
+describe('ForegroundFallbackManager message.updated', () => {
+  test('tracks model from message.updated and falls back on error', async () => {
+    const { client, mocks } = createMockClient();
+    const mgr = new ForegroundFallbackManager(client, makeChains(), true);
+
+    await mgr.handleEvent({
+      type: 'message.updated',
+      properties: {
+        info: {
+          sessionID: 'sess-2',
+          providerID: 'anthropic',
+          modelID: 'claude-opus-4-5',
+          error: { message: 'rate limit exceeded' },
+        },
+      },
+    });
+
+    expect(mocks.promptAsync).toHaveBeenCalledTimes(1);
+    const call = mocks.promptAsync.mock.calls[0] as [
+      {
+        body: { model: { providerID: string; modelID: string } };
+      },
+    ];
+    expect(call[0].body.model.providerID).toBe('openai');
+    expect(call[0].body.model.modelID).toBe('gpt-4o');
+  });
+
+  test('uses agent name from message.updated to select correct chain', async () => {
+    const { client, mocks } = createMockClient();
+    const mgr = new ForegroundFallbackManager(client, makeChains(), true);
+
+    // explorer message with its model
+    await mgr.handleEvent({
+      type: 'message.updated',
+      properties: {
+        info: {
+          sessionID: 'sess-3',
+          agent: 'explorer',
+          providerID: 'openai',
+          modelID: 'gpt-4o-mini',
+          error: { message: 'quota exceeded' },
+        },
+      },
+    });
+
+    expect(mocks.promptAsync).toHaveBeenCalledTimes(1);
+    const call = mocks.promptAsync.mock.calls[0] as [
+      {
+        body: { model: { providerID: string; modelID: string } };
+      },
+    ];
+    // explorer chain: ['openai/gpt-4o-mini', 'anthropic/claude-haiku']
+    // current=gpt-4o-mini is tried → next = claude-haiku
+    expect(call[0].body.model.providerID).toBe('anthropic');
+    expect(call[0].body.model.modelID).toBe('claude-haiku');
+  });
+});
+
+// ---------------------------------------------------------------------------
+// ForegroundFallbackManager — session.status retry
+// ---------------------------------------------------------------------------
+
+describe('ForegroundFallbackManager session.status', () => {
+  test('triggers fallback on retry status with rate limit message', async () => {
+    const { client, mocks } = createMockClient();
+    const mgr = new ForegroundFallbackManager(client, makeChains(), true);
+
+    // Pre-seed model
+    await mgr.handleEvent({
+      type: 'message.updated',
+      properties: {
+        info: {
+          sessionID: 'sess-4',
+          providerID: 'anthropic',
+          modelID: 'claude-opus-4-5',
+        },
+      },
+    });
+
+    await mgr.handleEvent({
+      type: 'session.status',
+      properties: {
+        sessionID: 'sess-4',
+        status: { type: 'retry', message: 'usage limit reached, retrying...' },
+      },
+    });
+
+    expect(mocks.promptAsync).toHaveBeenCalledTimes(1);
+  });
+
+  test('ignores session.status with non-rate-limit retry message', async () => {
+    const { client, mocks } = createMockClient();
+    const mgr = new ForegroundFallbackManager(client, makeChains(), true);
+
+    await mgr.handleEvent({
+      type: 'session.status',
+      properties: {
+        sessionID: 'sess-4',
+        status: { type: 'retry', message: 'connection timeout, retrying...' },
+      },
+    });
+
+    expect(mocks.promptAsync).not.toHaveBeenCalled();
+  });
+});
+
+// ---------------------------------------------------------------------------
+// ForegroundFallbackManager — chain exhaustion
+// ---------------------------------------------------------------------------
+
+describe('ForegroundFallbackManager chain exhaustion', () => {
+  test('does not call promptAsync when the only chain model is already the current model', async () => {
+    // Scenario: chain = ['openai/gpt-b'], current model IS 'openai/gpt-b'.
+    // tryFallback adds 'openai/gpt-b' to tried → chain.find() returns undefined → exhausted.
+    const { client, mocks } = createMockClient();
+    const mgr = new ForegroundFallbackManager(
+      client,
+      { orchestrator: ['openai/gpt-b'] },
+      true,
+    );
+
+    // Seed current model as the only chain entry
+    await mgr.handleEvent({
+      type: 'message.updated',
+      properties: {
+        info: {
+          sessionID: 's',
+          providerID: 'openai',
+          modelID: 'gpt-b',
+        },
+      },
+    });
+
+    // Rate limit fires — only model in chain is already current → nothing to fall back to
+    await mgr.handleEvent({
+      type: 'session.error',
+      properties: { sessionID: 's', error: { message: 'rate limit exceeded' } },
+    });
+
+    expect(mocks.promptAsync).not.toHaveBeenCalled();
+  });
+
+  test('does not call promptAsync when all chain models have been tried', async () => {
+    // Scenario: chain = ['anthropic/claude-a', 'openai/gpt-b'].
+    // Current model is 'openai/gpt-b' (the last fallback already in use).
+    // tried will contain: 'openai/gpt-b' (current) → chain.find() → 'anthropic/claude-a'
+    // would be picked… unless we also mark it tried via a prior switch.
+    // Use agent name tracking so we can target the right chain, then seed tried
+    // by having the manager go through both models via sequential events
+    // (each on a distinct session so dedup does not interfere).
+    const { client, mocks } = createMockClient();
+    const chain = ['openai/model-x', 'openai/model-y'];
+    const mgr = new ForegroundFallbackManager(
+      client,
+      { orchestrator: chain },
+      true,
+    );
+
+    // Session A: current model is model-x, which IS in the chain → picks model-y ✓
+    await mgr.handleEvent({
+      type: 'message.updated',
+      properties: {
+        info: {
+          sessionID: 'sess-exhaust',
+          agent: 'orchestrator',
+          providerID: 'openai',
+          modelID: 'model-x',
+          error: { message: 'rate limit exceeded' },
+        },
+      },
+    });
+    expect(mocks.promptAsync).toHaveBeenCalledTimes(1);
+
+    // Session B (fresh session, different ID): only model-y is in chain and it IS
+    // the current model → tried gets model-y → chain.find() = undefined → exhausted
+    const { client: client2, mocks: mocks2 } = createMockClient();
+    const mgr2 = new ForegroundFallbackManager(
+      client2,
+      { orchestrator: ['openai/model-y'] }, // single-entry chain already in use
+      true,
+    );
+    await mgr2.handleEvent({
+      type: 'message.updated',
+      properties: {
+        info: {
+          sessionID: 'sess-exhaust-2',
+          agent: 'orchestrator',
+          providerID: 'openai',
+          modelID: 'model-y',
+          error: { message: 'rate limit exceeded' },
+        },
+      },
+    });
+    expect(mocks2.promptAsync).not.toHaveBeenCalled();
+  });
+});
+
+// ---------------------------------------------------------------------------
+// ForegroundFallbackManager — deduplication
+// ---------------------------------------------------------------------------
+
+describe('ForegroundFallbackManager deduplication', () => {
+  test('ignores a second trigger within dedup window for same session', async () => {
+    const { client, mocks } = createMockClient();
+    const mgr = new ForegroundFallbackManager(client, makeChains(), true);
+
+    const event = {
+      type: 'session.error',
+      properties: {
+        sessionID: 'sess-dup',
+        error: { message: 'rate limit exceeded' },
+      },
+    };
+
+    await mgr.handleEvent(event);
+    await mgr.handleEvent(event); // immediate second trigger — should be deduped
+
+    expect(mocks.promptAsync).toHaveBeenCalledTimes(1);
+  });
+
+  test('different sessions are not deduplicated against each other', async () => {
+    const { client, mocks } = createMockClient();
+    const mgr = new ForegroundFallbackManager(client, makeChains(), true);
+
+    await mgr.handleEvent({
+      type: 'session.error',
+      properties: { sessionID: 'sess-A', error: { message: 'rate limit' } },
+    });
+    await mgr.handleEvent({
+      type: 'session.error',
+      properties: { sessionID: 'sess-B', error: { message: 'rate limit' } },
+    });
+
+    expect(mocks.promptAsync).toHaveBeenCalledTimes(2);
+  });
+});
+
+// ---------------------------------------------------------------------------
+// ForegroundFallbackManager — subagent.session.created
+// ---------------------------------------------------------------------------
+
+describe('ForegroundFallbackManager subagent.session.created', () => {
+  test('records agent name from subagent.session.created when agentName provided', async () => {
+    const { client, mocks } = createMockClient();
+    const mgr = new ForegroundFallbackManager(client, makeChains(), true);
+
+    // Register the session as 'explorer' via subagent creation event
+    await mgr.handleEvent({
+      type: 'subagent.session.created',
+      properties: { sessionID: 'sub-1', agentName: 'explorer' },
+    });
+
+    // Now trigger rate limit — should use explorer's chain
+    await mgr.handleEvent({
+      type: 'session.error',
+      properties: { sessionID: 'sub-1', error: { message: 'rate limit' } },
+    });
+
+    expect(mocks.promptAsync).toHaveBeenCalledTimes(1);
+    const call = mocks.promptAsync.mock.calls[0] as [
+      {
+        body: { model: { providerID: string; modelID: string } };
+      },
+    ];
+    // explorer chain: ['openai/gpt-4o-mini', 'anthropic/claude-haiku']
+    // no current model tracked → first untried = openai/gpt-4o-mini
+    expect(call[0].body.model.providerID).toBe('openai');
+    expect(call[0].body.model.modelID).toBe('gpt-4o-mini');
+  });
+});
+
+// ---------------------------------------------------------------------------
+// ForegroundFallbackManager — session.deleted cleanup
+// ---------------------------------------------------------------------------
+
+describe('ForegroundFallbackManager session.deleted', () => {
+  test('cleans up session state on session.deleted preventing memory leaks', async () => {
+    const { client, mocks } = createMockClient();
+    const mgr = new ForegroundFallbackManager(client, makeChains(), true);
+
+    // Populate all maps for this session
+    await mgr.handleEvent({
+      type: 'message.updated',
+      properties: {
+        info: {
+          sessionID: 'sess-del',
+          agent: 'orchestrator',
+          providerID: 'anthropic',
+          modelID: 'claude-opus-4-5',
+        },
+      },
+    });
+
+    // Delete the session
+    await mgr.handleEvent({
+      type: 'session.deleted',
+      properties: { sessionID: 'sess-del' },
+    });
+
+    // After deletion, a new rate-limit on the same ID should behave as a fresh
+    // session (no prior model known → uses chain from start, dedup cleared)
+    await mgr.handleEvent({
+      type: 'session.error',
+      properties: {
+        sessionID: 'sess-del',
+        error: { message: 'rate limit exceeded' },
+      },
+    });
+
+    // Should have triggered (dedup was cleared by session.deleted)
+    // and should pick the first chain model (no current model seed after deletion)
+    expect(mocks.promptAsync).toHaveBeenCalledTimes(1);
+    const call = mocks.promptAsync.mock.calls[0] as [
+      { body: { model: { providerID: string; modelID: string } } },
+    ];
+    // orchestrator chain: ['anthropic/claude-opus-4-5', 'openai/gpt-4o', 'google/gemini-2.5-pro']
+    // no current model → first untried = anthropic/claude-opus-4-5
+    expect(call[0].body.model.providerID).toBe('anthropic');
+    expect(call[0].body.model.modelID).toBe('claude-opus-4-5');
+  });
+
+  test('ignores session.deleted with no sessionID', async () => {
+    const { client } = createMockClient();
+    const mgr = new ForegroundFallbackManager(client, makeChains(), true);
+    // Should not throw
+    await expect(
+      mgr.handleEvent({ type: 'session.deleted', properties: {} }),
+    ).resolves.toBeUndefined();
+  });
+
+  test('cleans up state using info.id shape (top-level session deletion)', async () => {
+    // OpenCode emits { properties: { info: { id } } } for top-level sessions
+    // and { properties: { sessionID } } for subagent sessions. Both must clean up.
+    const { client, mocks } = createMockClient();
+    const mgr = new ForegroundFallbackManager(client, makeChains(), true);
+
+    // Seed state for the session
+    await mgr.handleEvent({
+      type: 'message.updated',
+      properties: {
+        info: {
+          sessionID: 'sess-info-del',
+          agent: 'orchestrator',
+          providerID: 'anthropic',
+          modelID: 'claude-opus-4-5',
+        },
+      },
+    });
+
+    // Delete via the info.id shape
+    await mgr.handleEvent({
+      type: 'session.deleted',
+      properties: { info: { id: 'sess-info-del' } },
+    });
+
+    // State is cleared: a new rate-limit on same ID should behave as fresh session
+    await mgr.handleEvent({
+      type: 'session.error',
+      properties: {
+        sessionID: 'sess-info-del',
+        error: { message: 'rate limit exceeded' },
+      },
+    });
+
+    // Triggered (dedup was cleared by deletion)
+    expect(mocks.promptAsync).toHaveBeenCalledTimes(1);
+  });
+});
+
+// ---------------------------------------------------------------------------
+// ForegroundFallbackManager — resolveChain correctness
+// ---------------------------------------------------------------------------
+
+describe('ForegroundFallbackManager resolveChain cross-agent isolation', () => {
+  test('does not use another agent chain when known agent has no configured chain', async () => {
+    // oracle has no chain in runtimeChains; without the fix resolveChain would
+    // fall through to the cross-agent "last resort" and pick a model from
+    // orchestrator's chain — re-prompting oracle with an orchestrator model.
+    const { client, mocks } = createMockClient();
+    const mgr = new ForegroundFallbackManager(
+      client,
+      {
+        // oracle intentionally absent — no chain configured
+        orchestrator: ['openai/gpt-4o', 'google/gemini-2.5-pro'],
+      },
+      true,
+    );
+
+    await mgr.handleEvent({
+      type: 'message.updated',
+      properties: {
+        info: {
+          sessionID: 'oracle-sess',
+          agent: 'oracle', // agent IS known
+          providerID: 'anthropic',
+          modelID: 'claude-opus-4-5',
+          error: { message: 'rate limit exceeded' },
+        },
+      },
+    });
+
+    // oracle has no chain → should not fall back at all
+    expect(mocks.promptAsync).not.toHaveBeenCalled();
+  });
+
+  test('uses cross-agent last-resort only when agent name is unknown', async () => {
+    // When the agent name is genuinely unknown AND current model is not in any
+    // chain, the last-resort flattened chain is acceptable.
+    const { client, mocks } = createMockClient();
+    const mgr = new ForegroundFallbackManager(
+      client,
+      { orchestrator: ['openai/gpt-4o'] },
+      true,
+    );
+
+    // No agent name tracked, no model tracked — triggers session.error
+    await mgr.handleEvent({
+      type: 'session.error',
+      properties: {
+        sessionID: 'unknown-agent-sess',
+        error: { message: 'rate limit exceeded' },
+      },
+    });
+
+    // Falls through to last-resort → picks first model from any chain
+    expect(mocks.promptAsync).toHaveBeenCalledTimes(1);
+    const call = mocks.promptAsync.mock.calls[0] as [
+      { body: { model: { providerID: string; modelID: string } } },
+    ];
+    expect(call[0].body.model.providerID).toBe('openai');
+    expect(call[0].body.model.modelID).toBe('gpt-4o');
+  });
+});

+ 363 - 0
src/hooks/foreground-fallback/index.ts

@@ -0,0 +1,363 @@
+/**
+ * Runtime model fallback for foreground (interactive) agent sessions.
+ *
+ * When OpenCode fires a session.error, message.updated, or session.status
+ * event containing a rate-limit signal, this manager:
+ *   1. Looks up the next untried model in the agent's configured chain
+ *   2. Aborts the rate-limited prompt via client.session.abort()
+ *   3. Re-queues the last user message via client.session.promptAsync()
+ *      with the new model — promptAsync returns immediately so we never
+ *      block the event handler waiting for a full LLM response.
+ *
+ * This mirrors the BackgroundTaskManager's fallback loop but operates
+ * reactively through the event system instead of wrapping prompt() in a
+ * try/catch, which is not possible for interactive (foreground) sessions.
+ */
+
+import type { PluginInput } from '@opencode-ai/plugin';
+import { log } from '../../utils/logger';
+
+type OpencodeClient = PluginInput['client'];
+
+// ---------------------------------------------------------------------------
+// Rate-limit detection
+// ---------------------------------------------------------------------------
+
+const RATE_LIMIT_PATTERNS = [
+  /\b429\b/,
+  /rate.?limit/i,
+  /too many requests/i,
+  /quota.?exceeded/i,
+  /usage.?exceeded/i,
+  /usage limit/i,
+  /overloaded/i,
+  /resource.?exhausted/i,
+  /insufficient.?quota/i,
+  /high concurrency/i,
+  /reduce concurrency/i,
+];
+
+export function isRateLimitError(error: unknown): boolean {
+  if (!error || typeof error !== 'object') return false;
+  const err = error as {
+    message?: string;
+    data?: { statusCode?: number; message?: string; responseBody?: string };
+  };
+  const text = [
+    err.message ?? '',
+    String(err.data?.statusCode ?? ''),
+    err.data?.message ?? '',
+    err.data?.responseBody ?? '',
+  ].join(' ');
+  return RATE_LIMIT_PATTERNS.some((p) => p.test(text));
+}
+
+// ---------------------------------------------------------------------------
+// Helpers
+// ---------------------------------------------------------------------------
+
+function parseModel(
+  model: string,
+): { providerID: string; modelID: string } | null {
+  const slash = model.indexOf('/');
+  if (slash <= 0 || slash >= model.length - 1) return null;
+  return { providerID: model.slice(0, slash), modelID: model.slice(slash + 1) };
+}
+
+/** Prevent re-triggering within this window for the same session. */
+const DEDUP_WINDOW_MS = 5_000;
+
+// ---------------------------------------------------------------------------
+// Manager
+// ---------------------------------------------------------------------------
+
+/**
+ * Manages runtime model fallback for foreground agent sessions.
+ *
+ * Constructed at plugin init with the ordered fallback chains for each agent
+ * (built from _modelArray entries merged with fallback.chains config).
+ */
+export class ForegroundFallbackManager {
+  /** sessionID → last observed model string ("providerID/modelID") */
+  private readonly sessionModel = new Map<string, string>();
+  /** sessionID → agent name (populated from message.updated info.agent field) */
+  private readonly sessionAgent = new Map<string, string>();
+  /** sessionID → set of models already attempted this session */
+  private readonly sessionTried = new Map<string, Set<string>>();
+  /** Sessions with an active fallback switch in flight */
+  private readonly inProgress = new Set<string>();
+  /** sessionID → timestamp of last trigger (for deduplication) */
+  private readonly lastTrigger = new Map<string, number>();
+
+  constructor(
+    private readonly client: OpencodeClient,
+    /**
+     * Ordered fallback chains per agent.
+     * e.g. { orchestrator: ['anthropic/claude-opus-4-5', 'openai/gpt-4o'] }
+     * The first model that hasn't been tried yet is selected on each fallback.
+     */
+    private readonly chains: Record<string, string[]>,
+    private readonly enabled: boolean,
+  ) {}
+
+  /**
+   * Process an OpenCode plugin event.
+   * Call this from the plugin's `event` hook for every event received.
+   */
+  async handleEvent(rawEvent: unknown): Promise<void> {
+    if (!this.enabled) return;
+    const event = rawEvent as { type: string; properties?: unknown };
+    if (!event?.type) return;
+
+    switch (event.type) {
+      case 'message.updated': {
+        const info = (
+          event.properties as { info?: Record<string, unknown> } | undefined
+        )?.info;
+        if (!info) break;
+        const sessionID = info.sessionID as string | undefined;
+        if (!sessionID) break;
+        // Capture agent name when available (OpenCode includes it on subagent messages)
+        if (typeof info.agent === 'string') {
+          this.sessionAgent.set(sessionID, info.agent);
+        }
+        // Track the model currently serving this session
+        if (
+          typeof info.providerID === 'string' &&
+          typeof info.modelID === 'string'
+        ) {
+          this.sessionModel.set(sessionID, `${info.providerID}/${info.modelID}`);
+        }
+        // Rate-limit on an individual message
+        if (info.error && isRateLimitError(info.error)) {
+          await this.tryFallback(sessionID);
+        }
+        break;
+      }
+
+      case 'session.error': {
+        const props = event.properties as
+          | { sessionID?: string; error?: unknown }
+          | undefined;
+        if (props?.sessionID && props.error && isRateLimitError(props.error)) {
+          await this.tryFallback(props.sessionID);
+        }
+        break;
+      }
+
+      case 'session.status': {
+        const props = event.properties as
+          | {
+              sessionID?: string;
+              status?: { type?: string; message?: string };
+            }
+          | undefined;
+        if (!props?.sessionID || props.status?.type !== 'retry') break;
+        const msg = props.status.message?.toLowerCase() ?? '';
+        if (
+          msg.includes('rate limit') ||
+          msg.includes('usage limit') ||
+          msg.includes('usage exceeded') ||
+          msg.includes('quota exceeded') ||
+          msg.includes('high concurrency') ||
+          msg.includes('reduce concurrency')
+        ) {
+          await this.tryFallback(props.sessionID);
+        }
+        break;
+      }
+
+      case 'subagent.session.created': {
+        // Some builds of OpenCode include the agent name here.
+        const props = event.properties as
+          | { sessionID?: string; agentName?: unknown }
+          | undefined;
+        if (props?.sessionID && typeof props.agentName === 'string') {
+          this.sessionAgent.set(props.sessionID, props.agentName);
+        }
+        break;
+      }
+
+      case 'session.deleted': {
+        // Clean up all per-session state to prevent unbounded memory growth
+        // in long-running instances with many subagent sessions.
+        // OpenCode emits two shapes depending on context:
+        //   { properties: { sessionID } }   — subagent / task sessions
+        //   { properties: { info: { id } } } — top-level session deletion
+        // Mirror the same dual-shape lookup used by BackgroundTaskManager.
+        const props = event.properties as
+          | { sessionID?: string; info?: { id?: string } }
+          | undefined;
+        const id = props?.info?.id ?? props?.sessionID;
+        if (id) {
+          this.sessionModel.delete(id);
+          this.sessionAgent.delete(id);
+          this.sessionTried.delete(id);
+          this.inProgress.delete(id);
+          this.lastTrigger.delete(id);
+        }
+        break;
+      }
+    }
+  }
+
+  // ---------------------------------------------------------------------------
+  // Core fallback logic
+  // ---------------------------------------------------------------------------
+
+  private async tryFallback(sessionID: string): Promise<void> {
+    if (!sessionID) return;
+    if (this.inProgress.has(sessionID)) return;
+
+    // Deduplicate: multiple events can fire for a single rate-limit event.
+    const now = Date.now();
+    if (now - (this.lastTrigger.get(sessionID) ?? 0) < DEDUP_WINDOW_MS) return;
+    this.lastTrigger.set(sessionID, now);
+
+    this.inProgress.add(sessionID);
+    try {
+      const currentModel = this.sessionModel.get(sessionID);
+      const agentName = this.sessionAgent.get(sessionID);
+      const chain = this.resolveChain(agentName, currentModel);
+      if (!chain.length) {
+        log('[foreground-fallback] no chain configured', { sessionID, agentName });
+        return;
+      }
+
+      if (!this.sessionTried.has(sessionID)) {
+        this.sessionTried.set(sessionID, new Set());
+      }
+      const tried = this.sessionTried.get(sessionID)!;
+      if (currentModel) tried.add(currentModel);
+
+      const nextModel = chain.find((m) => !tried.has(m));
+      if (!nextModel) {
+        log('[foreground-fallback] fallback chain exhausted', {
+          sessionID,
+          agentName,
+          tried: [...tried],
+        });
+        return;
+      }
+      tried.add(nextModel);
+
+      const ref = parseModel(nextModel);
+      if (!ref) {
+        log('[foreground-fallback] invalid model format', {
+          sessionID,
+          nextModel,
+        });
+        return;
+      }
+
+      // Retrieve the last user message to re-submit with the fallback model.
+      const result = await this.client.session.messages({
+        path: { id: sessionID },
+      });
+      const messages = (result.data ?? []) as Array<{
+        info: { role: string };
+        parts: unknown[];
+      }>;
+      const lastUser = [...messages]
+        .reverse()
+        .find((m) => m.info.role === 'user');
+      if (!lastUser) {
+        log('[foreground-fallback] no user message found', { sessionID });
+        return;
+      }
+
+      // Abort the currently rate-limited prompt so the session becomes idle.
+      try {
+        await this.client.session.abort({ path: { id: sessionID } });
+      } catch {
+        // Session may already be idle; safe to ignore.
+      }
+
+      // Give the server a moment to finalise the abort before re-prompting.
+      await new Promise((r) => setTimeout(r, 500));
+
+      // promptAsync queues the prompt and returns immediately — this avoids
+      // blocking the event handler while waiting for a full LLM response.
+      // Cast required: promptAsync is not in the plugin TypeScript types for
+      // oh-my-opencode-slim but IS present on the real OpenCode client at
+      // runtime (verified by opencode-rate-limit-fallback reference impl).
+      const sessionClient = this.client.session as unknown as {
+        promptAsync: (args: {
+          path: { id: string };
+          body: {
+            parts: unknown[];
+            model: { providerID: string; modelID: string };
+          };
+        }) => Promise<unknown>;
+      };
+      await sessionClient.promptAsync({
+        path: { id: sessionID },
+        body: { parts: lastUser.parts, model: ref },
+      });
+
+      this.sessionModel.set(sessionID, nextModel);
+      log('[foreground-fallback] switched to fallback model', {
+        sessionID,
+        agentName,
+        from: currentModel,
+        to: nextModel,
+      });
+    } catch (err) {
+      log('[foreground-fallback] fallback attempt failed', {
+        sessionID,
+        error: err instanceof Error ? err.message : String(err),
+      });
+    } finally {
+      this.inProgress.delete(sessionID);
+    }
+  }
+
+  // ---------------------------------------------------------------------------
+  // Chain resolution
+  // ---------------------------------------------------------------------------
+
+  /**
+   * Determine the fallback chain to use for a session.
+   *
+   * Priority:
+   * 1. Agent name known AND has a configured chain → return it directly
+   * 2. Agent name known but NO chain configured → return [] (no fallback;
+   *    do NOT bleed into other agents' chains which would re-prompt the
+   *    session with a model belonging to a completely different agent)
+   * 3. Agent name unknown, current model known → search all chains for
+   *    the model to infer which chain to use
+   * 4. Nothing matches → flatten all chains as a last resort (only
+   *    reached when both agent name and current model are unavailable)
+   */
+  private resolveChain(
+    agentName: string | undefined,
+    currentModel: string | undefined,
+  ): string[] {
+    if (agentName) {
+      // Agent is known: use its chain exactly, or no chain at all.
+      // Never fall through to cross-agent chains when the agent is identified.
+      return this.chains[agentName] ?? [];
+    }
+
+    // Agent unknown: try to infer from the current model.
+    if (currentModel) {
+      for (const chain of Object.values(this.chains)) {
+        if (chain.includes(currentModel)) return chain;
+      }
+    }
+
+    // Last resort: merged list across all agents preserving insertion order.
+    // Only reached when both agent name and current model are unavailable.
+    const all: string[] = [];
+    const seen = new Set<string>();
+    for (const chain of Object.values(this.chains)) {
+      for (const m of chain) {
+        if (!seen.has(m)) {
+          seen.add(m);
+          all.push(m);
+        }
+      }
+    }
+    return all;
+  }
+}

+ 1 - 0
src/hooks/index.ts

@@ -2,6 +2,7 @@ export type { AutoUpdateCheckerOptions } from './auto-update-checker';
 export { createAutoUpdateCheckerHook } from './auto-update-checker';
 export { createAutoUpdateCheckerHook } from './auto-update-checker';
 export { createChatHeadersHook } from './chat-headers';
 export { createChatHeadersHook } from './chat-headers';
 export { createDelegateTaskRetryHook } from './delegate-task-retry';
 export { createDelegateTaskRetryHook } from './delegate-task-retry';
+export { ForegroundFallbackManager, isRateLimitError } from './foreground-fallback';
 export { createJsonErrorRecoveryHook } from './json-error-recovery';
 export { createJsonErrorRecoveryHook } from './json-error-recovery';
 export { createPhaseReminderHook } from './phase-reminder';
 export { createPhaseReminderHook } from './phase-reminder';
 export { createPostReadNudgeHook } from './post-read-nudge';
 export { createPostReadNudgeHook } from './post-read-nudge';

+ 92 - 4
src/index.ts

@@ -7,6 +7,7 @@ import {
   createAutoUpdateCheckerHook,
   createAutoUpdateCheckerHook,
   createChatHeadersHook,
   createChatHeadersHook,
   createDelegateTaskRetryHook,
   createDelegateTaskRetryHook,
+  ForegroundFallbackManager,
   createJsonErrorRecoveryHook,
   createJsonErrorRecoveryHook,
   createPhaseReminderHook,
   createPhaseReminderHook,
   createPostReadNudgeHook,
   createPostReadNudgeHook,
@@ -42,6 +43,33 @@ const OhMyOpenCodeLite: Plugin = async (ctx) => {
       modelArrayMap[agentDef.name] = agentDef._modelArray;
       modelArrayMap[agentDef.name] = agentDef._modelArray;
     }
     }
   }
   }
+  // Build runtime fallback chains for all foreground agents.
+  // Each chain is an ordered list of model strings to try when the current
+  // model is rate-limited. Seeds from _modelArray entries (when the user
+  // configures model as an array), then appends fallback.chains entries.
+  const runtimeChains: Record<string, string[]> = {};
+  for (const agentDef of agentDefs) {
+    if (agentDef._modelArray?.length) {
+      runtimeChains[agentDef.name] = agentDef._modelArray.map((m) => m.id);
+    }
+  }
+  if (config.fallback?.enabled !== false) {
+    const chains =
+      (config.fallback?.chains as Record<string, string[] | undefined>) ?? {};
+    for (const [agentName, chainModels] of Object.entries(chains)) {
+      if (!chainModels?.length) continue;
+      const existing = runtimeChains[agentName] ?? [];
+      const seen = new Set(existing);
+      for (const m of chainModels) {
+        if (!seen.has(m)) {
+          seen.add(m);
+          existing.push(m);
+        }
+      }
+      runtimeChains[agentName] = existing;
+    }
+  }
+
   // Parse tmux config with defaults
   // Parse tmux config with defaults
   const tmuxConfig: TmuxConfig = {
   const tmuxConfig: TmuxConfig = {
     enabled: config.tmux?.enabled ?? false,
     enabled: config.tmux?.enabled ?? false,
@@ -92,6 +120,13 @@ const OhMyOpenCodeLite: Plugin = async (ctx) => {
   // Initialize JSON parse error recovery hook
   // Initialize JSON parse error recovery hook
   const jsonErrorRecoveryHook = createJsonErrorRecoveryHook(ctx);
   const jsonErrorRecoveryHook = createJsonErrorRecoveryHook(ctx);
 
 
+  // Initialize foreground fallback manager for runtime model switching
+  const foregroundFallback = new ForegroundFallbackManager(
+    ctx.client,
+    runtimeChains,
+    config.fallback?.enabled !== false && Object.keys(runtimeChains).length > 0,
+  );
+
   return {
   return {
     name: 'oh-my-opencode-slim',
     name: 'oh-my-opencode-slim',
 
 
@@ -151,17 +186,67 @@ const OhMyOpenCodeLite: Plugin = async (ctx) => {
       }
       }
       const configAgent = opencodeConfig.agent as Record<string, unknown>;
       const configAgent = opencodeConfig.agent as Record<string, unknown>;
 
 
-      // Runtime model fallback: resolve model arrays to the first
-      // provider/model whose provider is configured in OpenCode.
+      // Model resolution for foreground agents: pick the best available model
+      // by combining _modelArray entries with fallback.chains config.
+      //
       // NOTE: We cannot call ctx.client.provider.list() here because
       // NOTE: We cannot call ctx.client.provider.list() here because
       // the HTTP server is still initializing (causes deadlock).
       // the HTTP server is still initializing (causes deadlock).
       // Instead, inspect opencodeConfig.provider directly.
       // Instead, inspect opencodeConfig.provider directly.
-      if (Object.keys(modelArrayMap).length > 0) {
+      //
+      // NOTE: This is startup-time selection only — it picks the best
+      // available provider at plugin init. Runtime failover on API errors
+      // (e.g. rate limits mid-conversation) is handled separately by
+      // ForegroundFallbackManager via the event hook.
+      const fallbackChainsEnabled = config.fallback?.enabled !== false;
+      const fallbackChains = fallbackChainsEnabled
+        ? ((config.fallback?.chains as Record<string, string[] | undefined>) ??
+          {})
+        : {};
+
+      // Build effective model arrays: seed from _modelArray, then append
+      // fallback.chains entries so the resolver considers the full chain
+      // when picking the best available provider at startup.
+      const effectiveArrays: Record<
+        string,
+        Array<{ id: string; variant?: string }>
+      > = {};
+
+      for (const [agentName, models] of Object.entries(modelArrayMap)) {
+        effectiveArrays[agentName] = [...models];
+      }
+
+      for (const [agentName, chainModels] of Object.entries(fallbackChains)) {
+        if (!chainModels || chainModels.length === 0) continue;
+
+        if (!effectiveArrays[agentName]) {
+          // Agent has no _modelArray — seed from its current string model so
+          // the fallback chain appends after it rather than replacing it.
+          const entry = configAgent[agentName] as
+            | Record<string, unknown>
+            | undefined;
+          const currentModel =
+            typeof entry?.model === 'string' ? entry.model : undefined;
+          effectiveArrays[agentName] = currentModel
+            ? [{ id: currentModel }]
+            : [];
+        }
+
+        const seen = new Set(effectiveArrays[agentName].map((m) => m.id));
+        for (const chainModel of chainModels) {
+          if (!seen.has(chainModel)) {
+            seen.add(chainModel);
+            effectiveArrays[agentName].push({ id: chainModel });
+          }
+        }
+      }
+
+      if (Object.keys(effectiveArrays).length > 0) {
         const providerConfig =
         const providerConfig =
           (opencodeConfig.provider as Record<string, unknown>) ?? {};
           (opencodeConfig.provider as Record<string, unknown>) ?? {};
         const hasProviderConfig = Object.keys(providerConfig).length > 0;
         const hasProviderConfig = Object.keys(providerConfig).length > 0;
 
 
-        for (const [agentName, modelArray] of Object.entries(modelArrayMap)) {
+        for (const [agentName, modelArray] of Object.entries(effectiveArrays)) {
+          if (modelArray.length === 0) continue;
           let resolved = false;
           let resolved = false;
 
 
           if (hasProviderConfig) {
           if (hasProviderConfig) {
@@ -267,6 +352,9 @@ const OhMyOpenCodeLite: Plugin = async (ctx) => {
     },
     },
 
 
     event: async (input) => {
     event: async (input) => {
+      // Runtime model fallback for foreground agents (rate-limit detection)
+      await foregroundFallback.handleEvent(input.event);
+
       // Handle auto-update checking
       // Handle auto-update checking
       await autoUpdateChecker.event(input);
       await autoUpdateChecker.event(input);