mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 20:21:09 +02:00
First milestone of the LLM-fallback plan (docs/plans/llm-fallback-aliases.md). Introduces the `mana/<class>` namespace; the registry parses + validates aliases.yaml at startup and reloads on demand. Schema-rejects empty chains, missing provider prefixes, alias names outside the reserved namespace, default→unknown references, etc. Reload semantics: parse error keeps the previous good state in memory so a typo + SIGHUP doesn't take the service down. 5 aliases ship with the initial config: fast-text, long-form, structured, reasoning, vision. Each chain ends with a cloud provider so the system keeps working when the GPU server is offline. 32 unit tests covering happy path, schema validation, namespace check, reload safety, and a guard that the shipped aliases.yaml itself parses. M2 (health-cache + probe-loop) and M3 (router fallback execution) build on this; aliases are not yet wired into the request path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
54 lines
1.9 KiB
YAML
54 lines
1.9 KiB
YAML
# mana-llm Model Aliases — single source of truth for which class of
|
|
# model each backend feature uses.
|
|
#
|
|
# Consumers (mana-api, mana-ai, …) send `"model": "mana/<class>"` in
|
|
# their /v1/chat/completions requests; mana-llm resolves the alias to
|
|
# the chain below and tries entries in order, skipping providers that
|
|
# the health-cache has marked unhealthy.
|
|
#
|
|
# Order in `chain` = preference. First healthy entry wins. Each chain
|
|
# should end with a cloud provider so the system stays functional even
|
|
# when the local GPU server (mana-gpu, RTX 3090) is offline.
|
|
#
|
|
# Reload at runtime: `kill -HUP <pid>` after editing — no restart needed.
|
|
# Reference: docs/plans/llm-fallback-aliases.md.
|
|
|
|
aliases:
|
|
mana/fast-text:
|
|
description: "Short answers, classification, single-shot Q&A"
|
|
chain:
|
|
- ollama/qwen2.5:7b
|
|
- groq/llama-3.1-8b-instant
|
|
- openrouter/anthropic/claude-3-haiku
|
|
|
|
mana/long-form:
|
|
description: "Writing, essays, stories, longer prose"
|
|
chain:
|
|
- ollama/gemma3:12b
|
|
- groq/llama-3.3-70b-versatile
|
|
- openrouter/anthropic/claude-3.5-haiku
|
|
|
|
mana/structured:
|
|
description: "JSON output (comic storyboards, research subqueries, tag suggestions)"
|
|
chain:
|
|
- ollama/qwen2.5:7b
|
|
- groq/llama-3.1-8b-instant
|
|
- openrouter/openai/gpt-4o-mini
|
|
|
|
mana/reasoning:
|
|
description: "Agent missions, tool calls, multi-step plans"
|
|
# Cloud first by design — local 4-7B models are unreliable for tool calls
|
|
chain:
|
|
- openrouter/anthropic/claude-3.5-sonnet
|
|
- groq/llama-3.3-70b-versatile
|
|
|
|
mana/vision:
|
|
description: "Multimodal (image + text)"
|
|
chain:
|
|
- ollama/llava:7b
|
|
- google/gemini-2.0-flash-exp
|
|
- openrouter/openai/gpt-4o
|
|
|
|
# Default alias used when a request omits `model` or sends an unknown
|
|
# value with no provider prefix. Keep this conservative (cheap class).
|
|
default: mana/fast-text
|