managarten/docs/architecture/BYOK_PLAN.md
Till JS a33857fa39 feat(llm): add BYOK tier + 4 provider adapters (OpenAI, Anthropic, Gemini, Mistral)
Phase 1-3 of BYOK support. Introduces a 5th LLM tier 'byok' that
routes to user-provided API keys via direct browser fetches.

shared-llm additions:
- LlmTier extended with 'byok' (rank 3, between mana-server and cloud)
- ByokBackend: LlmBackend implementation that delegates key lookup
  to an app-provided resolver callback, then dispatches to the right
  provider adapter
- 4 provider adapters:
  - OpenAI (gpt-5, gpt-4o, o1 family)
  - Anthropic (Claude Opus/Sonnet/Haiku 4.6) with CORS header
  - Gemini (2.5 Pro/Flash) — REST API with different message format
  - Mistral — OpenAI-compatible, reuses shared openai-compat adapter
- Pricing table for 20+ models with USD per 1M tokens
- estimateCost() + formatCost() helpers

Keys stay device-local (IndexedDB in next phase). Browser-direct
fetches mean keys never touch Mana's server.

Updates two existing tier maps (memoro DetailView, SourceBadge) to
include the new tier.

Planning doc at docs/architecture/BYOK_PLAN.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:06:48 +02:00

9.4 KiB

BYOK — Bring Your Own Key

Architecture and implementation plan for user-provided API keys. Status: planning (2026-04-14)

Goals

  • User hinterlegt eigene API-Keys (OpenAI, Anthropic, Gemini, Mistral)
  • Keys verschluesselt in IndexedDB (User-Master-Key, AES-GCM)
  • Keys verlassen das Geraet nie (Browser-direct calls)
  • Orchestrator nutzt BYOK als 5. Tier neben browser/mana-server/cloud
  • Kostenschaetzung pro Call via Pricing-Tabelle
  • Multiple Keys pro Provider (Label-based, einer isDefault)

Architecture

User (Browser)
    |
    v
CompanionChat / any LLM task
    |
    v
LlmOrchestrator.run(task, input)
    |
    v
tier === 'byok' ?
    |
    v
ByokBackend
    |
    v
getByokKey(provider)  [callback provided at app init]
    |
    v
byokKeyVault (IndexedDB, encrypted)
    |
    v
  decrypt via user master key
    |
    v
ByokBackend.callProvider(provider, key, messages)
    |
    v
Provider-specific adapter (openai/anthropic/gemini/mistral)
    |
    v
direct HTTPS to api.openai.com / api.anthropic.com / ...

Tier placement

New tier order (ranked by "where data goes"):

none         (0) — stays on device
browser      (1) — stays on device
mana-server  (2) — Mana's own infrastructure
byok         (3) — User's third-party accounts (user-controlled)
cloud        (4) — Mana's cloud (charges user's Mana credits)

Reasoning: byok sits between mana-server and cloud because it leaves the user's network but goes to an account the user manages personally. cloud is last because it costs the user Mana credits.

Files to create

packages/shared-llm/src/
  tiers.ts                          → extend with 'byok'
  types.ts                          → ByokKeyResolver callback type
  backends/
    byok.ts                         → ByokBackend class
    byok-providers/
      openai.ts                     → OpenAI API adapter
      anthropic.ts                  → Anthropic API adapter
      gemini.ts                     → Gemini REST adapter
      mistral.ts                    → Mistral API adapter (OpenAI-compat)
      types.ts                      → ByokProvider interface
  pricing.ts                        → per-model token pricing
  store.svelte.ts                   → register ByokBackend

apps/mana/apps/web/src/
  lib/byok/
    types.ts                        → ByokKey interface
    vault.ts                        → encrypted IndexedDB CRUD
    store.svelte.ts                 → reactive Svelte store
    init.ts                         → wire key resolver into ByokBackend
  routes/(app)/settings/ai-keys/
    +page.svelte                    → management UI

apps/mana/apps/web/src/lib/data/database.ts
  → add _byokKeys table (v15 schema)

Data model

// packages/shared-llm/src/backends/byok-providers/types.ts
export type ByokProviderId = 'openai' | 'anthropic' | 'gemini' | 'mistral';

export interface ByokProvider {
  id: ByokProviderId;
  displayName: string;
  defaultModel: string;
  availableModels: string[];
  needsDangerousHeader?: boolean;  // Anthropic
  /** Call the provider with the user's key, return GenerateResult */
  call(opts: {
    apiKey: string;
    model: string;
    messages: ChatMessage[];
    temperature?: number;
    maxTokens?: number;
    onToken?: (token: string) => void;
  }): Promise<GenerateResult>;
}

// apps/mana/apps/web/src/lib/byok/types.ts
export interface ByokKey {
  id: string;
  provider: ByokProviderId;
  label: string;                 // "Work Anthropic"
  keyCipher: string;             // AES-GCM encrypted
  keyIv: string;                 // init vector
  model?: string;                // override default model
  isDefault: boolean;
  createdAt: string;
  updatedAt: string;
  lastUsedAt?: string;
  usageCount: number;
  totalTokens: number;
  deletedAt?: string;
}

Key resolver callback

The backend lives in shared-llm but keys live in the app's IndexedDB. We inject a resolver at app init:

// packages/shared-llm/src/backends/byok.ts
export type ByokKeyResolver = (
  provider: ByokProviderId,
  preferredLabel?: string,
) => Promise<{ apiKey: string; model: string } | null>;

export class ByokBackend implements LlmBackend {
  readonly tier = 'byok' as const;
  constructor(
    private resolver: ByokKeyResolver,
    private providers: Map<ByokProviderId, ByokProvider>,
  ) {}
  // ...
}

// apps/mana/apps/web/src/lib/byok/init.ts (app init)
import { llmOrchestrator } from '@mana/shared-llm';
import { getKeyForProvider } from './store.svelte';

llmOrchestrator.registerByokResolver(getKeyForProvider);

Provider adapters

OpenAI (CORS-friendly)

fetch('https://api.openai.com/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${apiKey}`,
  },
  body: JSON.stringify({
    model, messages, temperature, max_tokens: maxTokens, stream: true,
  }),
});
// SSE streaming response

Anthropic (needs dangerous header)

fetch('https://api.anthropic.com/v1/messages', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'x-api-key': apiKey,
    'anthropic-version': '2023-06-01',
    'anthropic-dangerous-direct-browser-access': 'true',
  },
  body: JSON.stringify({ model, messages, max_tokens, stream: true }),
});
// SSE streaming with different event schema than OpenAI

Gemini (REST with key in URL)

fetch(`https://generativelanguage.googleapis.com/v1beta/models/${model}:streamGenerateContent?key=${apiKey}`, {
  method: 'POST',
  body: JSON.stringify({
    contents: messagesToGeminiFormat(messages),
    generationConfig: { temperature, maxOutputTokens: maxTokens },
  }),
});
// Different message format!

Mistral (OpenAI-compatible)

fetch('https://api.mistral.ai/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${apiKey}`,
  },
  body: JSON.stringify({
    model, messages, temperature, max_tokens: maxTokens, stream: true,
  }),
});
// Same as OpenAI, can reuse adapter

Pricing (for cost estimation)

// packages/shared-llm/src/pricing.ts
export const PRICING: Record<string, { inputPer1k: number; outputPer1k: number }> = {
  // OpenAI (USD per 1K tokens)
  'gpt-5': { inputPer1k: 0.015, outputPer1k: 0.060 },
  'gpt-4o': { inputPer1k: 0.005, outputPer1k: 0.020 },
  'gpt-4o-mini': { inputPer1k: 0.0003, outputPer1k: 0.0012 },
  // Anthropic
  'claude-opus-4.6': { inputPer1k: 0.015, outputPer1k: 0.075 },
  'claude-sonnet-4.6': { inputPer1k: 0.003, outputPer1k: 0.015 },
  // Gemini
  'gemini-2.5-pro': { inputPer1k: 0.00125, outputPer1k: 0.005 },
  'gemini-2.5-flash': { inputPer1k: 0.00015, outputPer1k: 0.0006 },
  // Mistral
  'mistral-large-latest': { inputPer1k: 0.002, outputPer1k: 0.006 },
  'mistral-small-latest': { inputPer1k: 0.0002, outputPer1k: 0.0006 },
};

export function estimateCost(model: string, promptTokens: number, completionTokens: number): number {
  const p = PRICING[model];
  if (!p) return 0;
  return (promptTokens / 1000) * p.inputPer1k + (completionTokens / 1000) * p.outputPer1k;
}

Privacy rules

// In orchestrator routing
if (task.contentClass === 'sensitive') {
  // BYOK blocked by default — leaves device to third-party
  candidates = candidates.filter(t => t !== 'byok');
}
// User can opt-in per provider via
// settings.byok.sensitiveOptIn = ['anthropic']

Settings schema extensions

// LlmSettings (in shared-llm/src/types.ts)
export interface LlmSettings {
  allowedTiers: LlmTier[];
  taskOverrides: Record<string, LlmTier>;  // + 'byok' now valid
  fallbackToRulesOnError: boolean;
  showSourceInUi: boolean;
  cloudConsentGiven: boolean;
  // NEW:
  byok?: {
    defaultProvider?: ByokProviderId;
    sensitiveOptIn: ByokProviderId[];  // explicit consent for sensitive content
    preferredModel?: Record<ByokProviderId, string>;  // per-provider model override
  };
}

Implementation order

Phase 1 — Foundation (1.5h)

  1. Extend LlmTier with 'byok' in shared-llm
  2. Create ByokKey vault (IndexedDB + encrypt/decrypt)
  3. ByokBackend skeleton with provider registry
  4. Wire into orchestrator

Phase 2 — First provider (30min) 5. OpenAI adapter (simplest — CORS ok) 6. Test via companion chat

Phase 3 — More providers (1.5h) 7. Anthropic adapter (with dangerous-header) 8. Gemini adapter (different message format) 9. Mistral adapter (OpenAI-compatible, trivial)

Phase 4 — UI (1.5h) 10. Settings/ai-keys page 11. Add + edit + delete key modals 12. Usage tracking (increment on each call)

Phase 5 — Polish (30min) 13. Pricing table + cost estimation 14. Companion toolbar dropdown extension (BYOK options)

Total: ~5h

Decisions

Question Decision
Browser-direct vs. server-proxy? Browser-direct primary. No server-proxy fallback in v1 — if CORS blocks, show error with link to docs.
Providers in v1 OpenAI, Anthropic, Gemini, Mistral
Multiple keys per provider Yes, one isDefault, others by label
Cost estimation Yes, hardcoded pricing table (update manually)
Ollama BYOK (self-hosted) Skip for v1
Sensitive content + BYOK Blocked by default, explicit per-provider opt-in
Key encryption AES-GCM-256 via user master key (existing vault)
Key sync across devices NO — keys stay device-local (user must add on each device)