Multi-AI provider abstraction layer.
Supported providers
| Provider | Key Env Variable | Model Env Variable | Default Model |
|---|---|---|---|
| Anthropic Claude | ANTHROPIC_API_KEY |
ANTHROPIC_MODEL |
claude-sonnet-4-20250514 |
| OpenAI GPT | OPENAI_API_KEY |
OPENAI_MODEL |
gpt-4o-mini |
| Google Gemini | GOOGLE_API_KEY |
GOOGLE_MODEL |
gemini-2.5-flash |
| Ollama (local) | AI_PROVIDER=local |
OLLAMA_MODEL |
mistral:7b |
Detection order: Runtime override (header dropdown) → AI_PROVIDER env var → auto-detect (Anthropic → OpenAI → Google → Ollama).
Exports
generateText— Single-shot text generation.streamText— Token-streaming text generation (Anthropic/OpenAI; fallback for others).parseJSON— Parse AI response text as JSON (strips markdown fences).getProvider,hasProvider,isLocalProvider,isProviderDegraded,getProviderName,getProviderMeta— Provider detection.setRuntimeKey,setRuntimeOllama,setActiveProvider— Runtime configuration (Settings page).getConfiguredKeys— Masked key status for the Settings UI.getSupportedProviders— All provider names/models for the UI (derived from runtime config).checkOllamaConnection— Ollama connectivity check.loadKeysFromDatabase— Restore all persisted keys from DB into the runtime cache (called at startup).
- Source:
Members
(inner, constant) circuitBreakers :Object.<string, {failures: number, disabledUntil: number}>
Type:
- Object.<string, {failures: number, disabledUntil: number}>
- Source:
Methods
(static) checkOllamaConnection() → {Promise.<Object>}
Check Ollama server connectivity and verify the configured model is available.
- Source:
Returns:
Resolves to {ok: boolean, model?: string, baseUrl?: string, availableModels?: string[], error?: string}.
- Type
- Promise.<Object>
(static) generateText(prompt, optionsopt) → {Promise.<string>}
Generate text from an AI provider (single-shot, non-streaming). Automatically detects the active provider and routes the request.
FEA-003: On rate-limit errors, automatically falls back to the next configured provider in CLOUD_DETECT_ORDER before giving up. Each provider has a circuit breaker that disables it for 5 minutes after a rate-limit failure that survived all internal retries.
Parameters:
| Name | Type | Attributes | Description | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
prompt |
string | Object | Plain string or structured |
|||||||||||||
options |
Object |
<optional> |
Properties
|
- Source:
Throws:
-
If no AI provider is configured or all providers fail.
- Type
- Error
Returns:
The generated text response.
- Type
- Promise.<string>
(static) getConfiguredKeys() → {Object}
Returns masked API keys and Ollama config for the Settings UI. Never returns full keys — only masked versions for display.
- Source:
Returns:
- Type
- Object
(static) getProvider() → {string|null}
- Source:
Returns:
Current provider ID ("anthropic", "openai", "google", "local"), or null.
- Type
- string | null
(static) getProviderMeta() → {Object|null}
- Source:
Returns:
Full provider metadata, or null.
- Type
- Object | null
(static) getProviderName() → {string}
- Source:
Returns:
Human-readable provider name (e.g. "Claude Sonnet"), or "No provider configured".
- Type
- string
(static) getSupportedProviders() → {Array.<{id: string, name: string, model: string, docsUrl: string}>}
Returns the list of all supported providers with current names/models. Derives from buildProviderMeta() so model names stay in sync with what's actually used in API calls. Consumed by GET /api/config.
- Source:
Returns:
- Type
- Array.<{id: string, name: string, model: string, docsUrl: string}>
(static) hasProvider() → {boolean}
- Source:
Returns:
true if any AI provider is configured.
- Type
- boolean
(static) isLocalProvider() → {boolean}
- Source:
Returns:
true if the active provider is Ollama (local).
- Type
- boolean
(static) isProviderDegraded() → {boolean}
true when the AI provider is operating in a degraded state — either a
sticky fallback is active (primary was rate-limited) or the primary
provider's circuit breaker is open. Used by the feedback loop to skip
expensive AI calls that would block run completion.
- Source:
Returns:
- Type
- boolean
(static) isRateLimitError(err) → {boolean}
Detect whether an error is a rate limit / quota exhaustion from any AI provider. Used internally for retry decisions and exported for the pipeline to detect rate limits that survived all retries.
Parameters:
| Name | Type | Description |
|---|---|---|
err |
Error |
- Source:
Returns:
- Type
- boolean
(static) isTransientServerError(err) → {boolean}
Detect transient server-side failures that warrant retry + provider fallback but aren't rate limits. Common examples:
- Google Gemini 503 "This model is currently experiencing high demand"
- Anthropic 503 "overloaded_error" (already matches isRateLimitError via /overloaded/)
- OpenAI 500/502/504 transient backend errors
- Provider-specific HTTP 5xx with "high demand", "service unavailable", "try again later"
Distinct from isRateLimitError — these are not quota issues, they're temporary server outages. We retry with backoff and fall back to other providers, but don't trip the per-provider rate-limit circuit breaker (the provider's key is fine; the backend is struggling).
Parameters:
| Name | Type | Description |
|---|---|---|
err |
Error |
- Source:
Returns:
- Type
- boolean
(static) loadKeysFromDatabase() → {number}
Restore all persisted API keys and Ollama config from the database into the runtime cache. Called once at server startup after the DB is initialised.
Keys stored in the DB take precedence over the default detection logic only when no matching env var is already set — env vars remain the canonical override so Docker / K8s deployments are unaffected.
- Source:
Returns:
The number of providers successfully loaded from the database.
- Type
- number
(static) parseJSON(text) → {Object}
Parse AI response text as JSON. Strips markdown code fences if present.
Parameters:
| Name | Type | Description |
|---|---|---|
text |
string | Raw AI response text. |
- Source:
Throws:
-
If the text is not valid JSON after cleanup.
- Type
- SyntaxError
Returns:
Parsed JSON object.
- Type
- Object
(static) setActiveProvider(provider)
Override the active provider selection (used by the quick-switch dropdown). The provider must already have a valid key/config — this does not set any key.
Parameters:
| Name | Type | Description |
|---|---|---|
provider |
string | null | Provider ID to pin, or null to resume auto-detect. |
- Source:
(static) setRuntimeKey(provider, key)
Set an AI provider API key at runtime (via Settings page). Persists the key to the database so it survives server restarts. Pass an empty string to clear the key both in-memory and in the DB.
Parameters:
| Name | Type | Description |
|---|---|---|
provider |
string |
|
key |
string | The API key string, or |
- Source:
(static) setRuntimeOllama(optsopt)
Configure Ollama runtime settings (via Settings page). Persists the config to the database so it survives server restarts.
Parameters:
| Name | Type | Attributes | Description | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
opts |
Object |
<optional> |
Properties
|
- Source:
(static) streamText(promptOrMessages, onToken, optionsopt) → {Promise.<string>}
Token-streaming variant of generateText.
Calls onToken(string) for each token as it arrives.
Returns the full accumulated text when the stream completes.
Error handling
If the streaming call fails with a retryable error (rate limit or
transient 5xx) BEFORE any tokens are emitted, we transparently retry
via generateText() — which applies the full FEA-003 retry + fallback
chain and emits the full response as a single synthetic "token". Once
tokens have started flowing we can't safely fall back (the user would
see two partial responses), so mid-stream failures propagate as-is.
Google and Ollama providers never start a real stream — they always
delegate to generateText() (their SDKs don't support incremental
streaming from this codebase), so they get fallback for free.
Parameters:
| Name | Type | Attributes | Description | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
promptOrMessages |
string | Object | Plain string or structured messages. |
|||||||||||||
onToken |
function | Callback invoked for each token. |
|||||||||||||
options |
Object |
<optional> |
Properties
|
- Source:
Throws:
-
If no AI provider is configured.
- Type
- Error
Returns:
The full accumulated response text.
- Type
- Promise.<string>
(inner) composeSignal(external, timeoutMs) → {Object}
Compose an AbortSignal that fires on EITHER the external signal (user abort) OR a per-call timeout — whichever comes first. Returns the composite signal and a cleanup function that MUST be called in a finally block to prevent the timeout from leaking if the call completes before the deadline.
Parameters:
| Name | Type | Description |
|---|---|---|
external |
AbortSignal | undefined | Signal from runWithAbort (user abort). |
timeoutMs |
number | Per-call deadline. |
- Source:
Returns:
{ signal: AbortSignal, cleanup: Function }
- Type
- Object
(inner) getFallbackProviders(primaryProvider) → {Array.<string>}
FEA-003: Get the ordered list of fallback providers to try when the primary provider hits a rate limit or transient error.
Same-tier only — cloud primary falls back to other cloud providers; local primary has no fallback. This prevents cross-tier mismatches where a prompt built for cloud (~1600 chars, 128K context assumed) gets delivered to Ollama (4K context, needs >120s to process) and hits the chat timeout. Ollama is never a cross-tier rescue — the prompt shape, context window, and response latency are too different.
To use Ollama as a primary, set AI_PROVIDER=local or pick it from
the provider dropdown — detectProvider() will route all calls to Ollama
with the correct tier-specific prompt.
Parameters:
| Name | Type | Description |
|---|---|---|
primaryProvider |
string | The provider that failed. |
- Source:
Returns:
Ordered list of same-tier fallback provider IDs.
- Type
- Array.<string>
(inner) getUserConfiguredKey(envName) → {string}
Get a user-configured key WITHOUT the demo fallback. Used by getConfiguredKeys() so BYOK detection is accurate.
Parameters:
| Name | Type | Description |
|---|---|---|
envName |
string |
- Source:
Returns:
- Type
- string
(inner) hasOllamaConfig()
True if Ollama has any config (runtime or env) hinting it should be auto-detected.
- Source:
(inner) isCircuitBreakerOpen(provider) → {boolean}
Check whether a provider's circuit breaker is open (disabled).
Parameters:
| Name | Type | Description |
|---|---|---|
provider |
string |
- Source:
Returns:
true if the provider is temporarily disabled.
- Type
- boolean
(inner) isProviderUsable(provider) → {boolean}
Check whether a provider is usable right now (has a key or, for Ollama, is not disabled). Single source of truth — used by detectProvider, the quick-switch override, and the forced-env path.
Parameters:
| Name | Type | Description |
|---|---|---|
provider |
string |
- Source:
Returns:
- Type
- boolean
(inner) isRetryableError()
True if the error should be retried — either a rate limit (quota issue) or a transient server error (provider outage).
- Source:
(inner) recordProviderFailure(provider)
Record a rate-limit failure for a provider. If the threshold is reached, the provider is disabled for CIRCUIT_BREAKER_COOLDOWN_MS.
Parameters:
| Name | Type | Description |
|---|---|---|
provider |
string |
- Source:
(inner) recordProviderSuccess(provider)
Record a successful call — resets the failure counter.
Parameters:
| Name | Type | Description |
|---|---|---|
provider |
string |
- Source: