L-1+L-2+L-3: mana-swift-llm Initial — lift aus memoro-native
Neues Swift-Package mit lokalen LLM-Backends für alle nativen mana- e.V.-Apps. Lift der bisher Memoro-eigenen Files in `memoro-native/Sources/Core/AI/` plus zwei neue Layer: ManaSharedModels (App-Group-Container-Helper) und ManaLLM-Facade. Library-Products: - ManaLLM — Backend-Abstraktion (FoundationModels, Gemma 4 E2B/E4B, NoOp), Router mit Priority-Liste, High-Level-Facade `ManaLLM.summarize/generate/classify` mit fast/creative/deep Level. - ManaLLMShared — App-Group `group.ev.mana.models` Container, HF_HUB_CACHE-Setup, Legacy-Fallback wenn Group fehlt. Lift-Anpassungen ggü. memoro: - public-Marker auf protocol + types + actors - generischer `generate(prompt:instructions:maxTokens:)` zu LLMBackend-Protocol hinzu; `summarize` als Default-Impl auf Basis von generate - AppleFMBackend behält optimierten @Generable-Summary-Path - GemmaBackend nutzt ManaSharedModels.effectiveCacheURL() statt eigenen Application-Support-Pfad; allowsCellular kommt jetzt als Initializer-Param statt App-Settings-Lookup - LLMRouter: Memoro-spezifische User-Pref-Store-Logic durch Priority-Liste-API ersetzt - LLMLog-Subsystem `ev.mana.llm` statt App-eigenes `Log.ai` Build: `swift build` clean (76s, MLX-Toolchain-Resolution beim ersten Lauf). 4/4 Parser-Tests grün. Doku: ../mana/docs/MANA_LLM.md (Plattform-SOT), CLAUDE.md (Konventionen + Lift-Tabelle). Folge: L-4 Memoro auf ManaLLM umstellen, L-5 pageta-Pilot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
commit
fd376bbdce
13 changed files with 1354 additions and 0 deletions
11
.gitignore
vendored
Normal file
11
.gitignore
vendored
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
.DS_Store
|
||||
.build/
|
||||
DerivedData/
|
||||
*.xcuserdatad/
|
||||
xcuserdata/
|
||||
*.xcuserstate
|
||||
.swiftpm/
|
||||
Package.resolved
|
||||
|
||||
*.log
|
||||
*.xcresult
|
||||
114
CLAUDE.md
Normal file
114
CLAUDE.md
Normal file
|
|
@ -0,0 +1,114 @@
|
|||
# CLAUDE.md — mana-swift-llm
|
||||
|
||||
Guidance für Claude Code in diesem Repo.
|
||||
|
||||
> **Plattform-SOT:** [`../mana/docs/MANA_LLM.md`](../mana/docs/MANA_LLM.md)
|
||||
> ist die übergreifende Architektur-Doku. Dieses CLAUDE.md ist die
|
||||
> Repo-lokale Konventions-Doku.
|
||||
|
||||
## Was dieses Repo ist
|
||||
|
||||
Swift-Package mit lokalen LLM-Backends für alle nativen mana-e.V.-
|
||||
Apps. Zwei Library-Products:
|
||||
|
||||
- **ManaLLM** — Backend-Abstraktion (FoundationModels, Gemma 4,
|
||||
Router) + High-Level-API (`ManaLLM.summarize`, `.classify`,
|
||||
`.generate`). Bringt MLX-Swift-LM-Toolchain mit (~30 MB Dep).
|
||||
- **ManaLLMShared** — App-Group-Container-Helper für gemeinsamen
|
||||
HuggingFace-Cache. Schmale Lib ohne MLX-Dep — Apps die nur den
|
||||
Container brauchen (z.B. für ein anderes Modell-Setup) konsumieren
|
||||
nur das.
|
||||
|
||||
Konsumenten heute: `memoro-native`. Geplant: alle 12 native mana-Apps
|
||||
(siehe Use-Case-Map in `MANA_LLM.md`).
|
||||
|
||||
## Architektur-Invarianten
|
||||
|
||||
Beschlossen. Nicht ohne explizite Diskussion antasten.
|
||||
|
||||
1. **Eigenes Repo statt ManaCore-Erweiterung.** `mana-swift-core`
|
||||
bleibt schlank (Architektur-Invariante "genau zwei Products" —
|
||||
ManaCore + ManaTokens). LLM-Toolchain ist schwer und hat eigenen
|
||||
Versions-Lifecycle (MLX-Swift updates häufig).
|
||||
2. **MLX-Swift, kein anderer LLM-Stack.** Apple-optimierte Inferenz
|
||||
auf Apple Silicon (ANE + GPU + CPU). Keine llama.cpp-Forks, keine
|
||||
eigene Python-Bridge, keine ONNX-Runtime.
|
||||
3. **Foundation Models bevorzugt.** Apple's System-Modell ist
|
||||
shared-by-OS, gratis, ANE-accelerated. Apps nutzen FM wo immer
|
||||
Capability reicht; Gemma nur wenn FM nicht reicht.
|
||||
4. **Public API ist `Sendable`.** Swift-6-Strict-Concurrency.
|
||||
5. **Keine PII in Logs.** OSLog-Subsystem `ev.mana.llm`, alle
|
||||
Transcripts/Prompts mit `privacy: .private` markiert. Compliance
|
||||
([`mana/docs/COMPLIANCE.md`](../mana/docs/COMPLIANCE.md)) gilt.
|
||||
|
||||
## Konventionen
|
||||
|
||||
- **Swift 6.0** strict concurrency
|
||||
- **iOS 18 / macOS 15** Minimum (für FoundationModels: iOS 26+ —
|
||||
ManaLLM checked Availability zur Runtime)
|
||||
- **MLX-Swift-LM** über `branch: main` (kein stabiler Tag bis MLX
|
||||
v1.0 — pin auf branch ist die de-facto-Konvention)
|
||||
- **Doc-Comments** pflicht auf jedem `public`-Symbol (`///`)
|
||||
|
||||
## Versionierung
|
||||
|
||||
- Semver mit häufigen Patch-Releases (MLX-Swift bewegt sich)
|
||||
- Git-Tags nach jedem Sinn-Abschnitt auf `main`
|
||||
- CHANGELOG.md pflicht — was hat sich geändert, was müssen Apps anpassen
|
||||
|
||||
## Lokal entwickeln
|
||||
|
||||
```bash
|
||||
swift build # baut beide Targets
|
||||
swift test # Tests (FM gated auf macOS 15+)
|
||||
```
|
||||
|
||||
Für Integration in eine App (memoro-native, später pageta, etc.):
|
||||
|
||||
```yaml
|
||||
# project.yml
|
||||
packages:
|
||||
ManaLLM:
|
||||
path: ../mana-swift-llm # dev — direktes Path-Dep
|
||||
# url: https://git.mana.how/till/mana-swift-llm.git
|
||||
# from: "0.1.0"
|
||||
|
||||
targets:
|
||||
YourApp:
|
||||
dependencies:
|
||||
- package: ManaLLM
|
||||
product: ManaLLM
|
||||
- package: ManaLLM
|
||||
product: ManaLLMShared
|
||||
```
|
||||
|
||||
## Cross-Repo-Doks
|
||||
|
||||
- [`../mana/docs/MANA_LLM.md`](../mana/docs/MANA_LLM.md) —
|
||||
Plattform-SOT
|
||||
- [`../mana/docs/MANA_SWIFT.md`](../mana/docs/MANA_SWIFT.md) —
|
||||
Native-Plattform-SOT
|
||||
- [`../mana/docs/COMPLIANCE.md`](../mana/docs/COMPLIANCE.md) —
|
||||
Datenschutz / Telemetrie-Regeln
|
||||
- [`../mana-swift-core/CLAUDE.md`](../mana-swift-core/CLAUDE.md) —
|
||||
Schwester-Package
|
||||
- [`../memoro-native/Sources/Core/AI/`](../memoro-native/Sources/Core/AI/)
|
||||
— Quell-Code-Ursprung (vor Lift)
|
||||
|
||||
## Lift-Herkunft
|
||||
|
||||
Quelle: `memoro-native/Sources/Core/AI/` (2026-05-18). Files
|
||||
übernommen + generalisiert:
|
||||
|
||||
| Original | Hier | Anpassung |
|
||||
|---|---|---|
|
||||
| `LLMBackend.swift` | `Sources/ManaLLM/LLMBackend.swift` | + generische `generate(prompt:maxTokens:)`-Methode, `summarize` als Default-Impl |
|
||||
| `LLMBackendID.swift` | `Sources/ManaLLM/LLMBackend.swift` | inline |
|
||||
| `AppleFMBackend.swift` | `Sources/ManaLLM/AppleFMBackend.swift` | identisch |
|
||||
| `GemmaBackend.swift` | `Sources/ManaLLM/GemmaBackend.swift` | `huggingFaceCacheRoot()` nutzt `ManaSharedModels` |
|
||||
| `LLMRouter.swift` | `Sources/ManaLLM/LLMRouter.swift` | Memoro-User-Pref-Logic durch capability-basiertes Routing ersetzt |
|
||||
| `NoOpLLMBackend.swift` | `Sources/ManaLLM/NoOpBackend.swift` | umbenannt, identisch |
|
||||
|
||||
Neu hier (nicht aus memoro):
|
||||
- `ManaLLMShared/ManaSharedModels.swift` — App-Group-Container-Helper
|
||||
- `ManaLLM/ManaLLM.swift` — High-Level Facade API
|
||||
59
Package.swift
Normal file
59
Package.swift
Normal file
|
|
@ -0,0 +1,59 @@
|
|||
// swift-tools-version: 6.0
|
||||
import PackageDescription
|
||||
|
||||
/// `mana-swift-llm` — Swift-Package mit lokalen LLM-Backends für alle
|
||||
/// nativen mana-e.V.-Apps (FoundationModels + Gemma 4 via MLX).
|
||||
///
|
||||
/// Zwei Library-Products:
|
||||
///
|
||||
/// - **ManaLLM**: Backend-Abstraktion + Router. Apps importieren
|
||||
/// dieses Modul für `ManaLLM.summarize(...)` oder direkte Backend-
|
||||
/// Zugriffe. Bringt MLX-Swift-LM + swift-transformers als
|
||||
/// Dependencies mit.
|
||||
///
|
||||
/// - **ManaLLMShared**: App-Group-Container-Helper für gemeinsamen
|
||||
/// HuggingFace-Cache (`group.ev.mana.models`). Apps nutzen das,
|
||||
/// um Modelle einmal zu laden und über alle teilnehmenden Apps
|
||||
/// hinweg per mmap zu lesen. Kein MLX-Dep — schmale Tool-Lib.
|
||||
///
|
||||
/// SOT-Doku: `mana/docs/MANA_LLM.md`. Plattform-Kontext:
|
||||
/// `mana/docs/MANA_SWIFT.md`. Bewusst eigenes Repo (nicht in
|
||||
/// mana-swift-core), weil die MLX-Swift-Toolchain ~30 MB schwer ist
|
||||
/// und ManaCore lean bleiben soll (Architektur-Invariante).
|
||||
let package = Package(
|
||||
name: "mana-swift-llm",
|
||||
platforms: [
|
||||
.iOS(.v18),
|
||||
.macOS(.v15),
|
||||
],
|
||||
products: [
|
||||
.library(name: "ManaLLM", targets: ["ManaLLM"]),
|
||||
.library(name: "ManaLLMShared", targets: ["ManaLLMShared"]),
|
||||
],
|
||||
dependencies: [
|
||||
.package(url: "https://github.com/ml-explore/mlx-swift-lm", branch: "main"),
|
||||
.package(url: "https://github.com/huggingface/swift-huggingface", from: "0.9.0"),
|
||||
.package(url: "https://github.com/huggingface/swift-transformers", from: "1.3.0"),
|
||||
],
|
||||
targets: [
|
||||
.target(
|
||||
name: "ManaLLM",
|
||||
dependencies: [
|
||||
"ManaLLMShared",
|
||||
.product(name: "MLXLLM", package: "mlx-swift-lm"),
|
||||
.product(name: "MLXLMCommon", package: "mlx-swift-lm"),
|
||||
.product(name: "MLXHuggingFace", package: "mlx-swift-lm"),
|
||||
.product(name: "HuggingFace", package: "swift-huggingface"),
|
||||
.product(name: "Tokenizers", package: "swift-transformers"),
|
||||
]
|
||||
),
|
||||
.target(
|
||||
name: "ManaLLMShared",
|
||||
dependencies: []
|
||||
),
|
||||
.testTarget(
|
||||
name: "ManaLLMTests",
|
||||
dependencies: ["ManaLLM"]
|
||||
),
|
||||
]
|
||||
)
|
||||
47
README.md
Normal file
47
README.md
Normal file
|
|
@ -0,0 +1,47 @@
|
|||
# mana-swift-llm
|
||||
|
||||
Swift-Package mit lokalen LLM-Backends für alle nativen
|
||||
[mana e.V.](https://mana-ev.ch) iOS-/macOS-Apps.
|
||||
|
||||
Zwei Library-Products:
|
||||
|
||||
- **`ManaLLM`** — Backend-Abstraktion + High-Level-Facade.
|
||||
Backends: Apple Foundation Models, Gemma 4 E2B/E4B (via MLX-Swift),
|
||||
NoOp-Fallback. Router wählt nach Capability automatisch.
|
||||
|
||||
- **`ManaLLMShared`** — App-Group-Container-Helper für
|
||||
gemeinsamen HuggingFace-Cache. Apps mit dem `group.ev.mana.models`-
|
||||
Entitlement teilen heruntergeladene Modelle — **eine** App lädt,
|
||||
alle anderen lesen.
|
||||
|
||||
## Schnell-Start
|
||||
|
||||
```swift
|
||||
import ManaLLM
|
||||
|
||||
@main
|
||||
struct MeineApp: App {
|
||||
init() {
|
||||
// HF_HUB_CACHE auf den shared Container setzen.
|
||||
ManaLLM.configure()
|
||||
}
|
||||
|
||||
var body: some Scene {
|
||||
// ...
|
||||
}
|
||||
}
|
||||
|
||||
// Irgendwo später:
|
||||
let summary = await ManaLLM.summarize(longText)
|
||||
let story = await ManaLLM.generate(
|
||||
prompt: "Schreib eine kurze Reise-Story über Konstanz.",
|
||||
level: .creative
|
||||
)
|
||||
let tags = await ManaLLM.classify(text, into: ["#sport", "#kultur"])
|
||||
```
|
||||
|
||||
## Plattform-Doku
|
||||
|
||||
- [`../mana/docs/MANA_LLM.md`](https://git.mana.how/mana/mana/src/branch/main/docs/MANA_LLM.md) — Architektur + Use-Case-Map
|
||||
- [`../mana/docs/MANA_SWIFT.md`](https://git.mana.how/mana/mana/src/branch/main/docs/MANA_SWIFT.md) — Native-Plattform-SOT
|
||||
- [`CLAUDE.md`](CLAUDE.md) — Repo-lokale Konventionen
|
||||
192
Sources/ManaLLM/AppleFMBackend.swift
Normal file
192
Sources/ManaLLM/AppleFMBackend.swift
Normal file
|
|
@ -0,0 +1,192 @@
|
|||
#if canImport(FoundationModels)
|
||||
import FoundationModels
|
||||
#endif
|
||||
import Foundation
|
||||
import OSLog
|
||||
|
||||
/// `LLMBackend` über Apples `FoundationModels` Framework (iOS 26+).
|
||||
///
|
||||
/// Läuft auf demselben ~3 B-Modell, das auch Apple Intelligence
|
||||
/// antreibt. ANE-beschleunigt, kein Modell-Download — Apple liefert
|
||||
/// das Modell mit dem System aus. **System-shared:** alle Apps auf
|
||||
/// demselben Gerät nutzen dieselbe Modell-Instanz, kein Cross-App-
|
||||
/// Setup nötig (anders als Gemma — siehe `ManaSharedModels`).
|
||||
///
|
||||
/// **Token-Window:** 4096 (Instructions + Prompt + Response). Bei
|
||||
/// längeren Inputs hart auf ~3000 chars geklippt. Map-Reduce über
|
||||
/// längere Inputs liegt im Aufrufer-Pfad.
|
||||
public actor AppleFMBackend: LLMBackend {
|
||||
public let identifier: LLMBackendID = .appleFM
|
||||
|
||||
public init() {}
|
||||
|
||||
public func availability() async -> LLMAvailability {
|
||||
#if canImport(FoundationModels)
|
||||
if #available(iOS 26.0, macOS 26.0, *) {
|
||||
let model = SystemLanguageModel.default
|
||||
switch model.availability {
|
||||
case .available:
|
||||
return .available
|
||||
case let .unavailable(reason):
|
||||
switch reason {
|
||||
case .deviceNotEligible:
|
||||
return .unavailableDeviceNotEligible
|
||||
case .modelNotReady:
|
||||
return .unavailableModelNotReady
|
||||
case .appleIntelligenceNotEnabled:
|
||||
return .unavailableAppleIntelligenceNotEnabled
|
||||
@unknown default:
|
||||
return .unknown(String(describing: reason))
|
||||
}
|
||||
}
|
||||
}
|
||||
return .unavailableOSTooOld
|
||||
#else
|
||||
return .unavailableOSTooOld
|
||||
#endif
|
||||
}
|
||||
|
||||
public func prepare(
|
||||
onProgress: @Sendable @escaping (LLMPrepareUpdate) -> Void
|
||||
) async throws {
|
||||
onProgress(LLMPrepareUpdate(stage: .checking, fractionCompleted: 0))
|
||||
_ = await availability()
|
||||
// Kein expliziter prepare-Pfad — Apple managt das. "ready"
|
||||
// unabhängig vom Availability-Wert; der UI-Text kommt
|
||||
// separat aus `LLMRouter.availabilityMap()`.
|
||||
onProgress(LLMPrepareUpdate(stage: .ready, fractionCompleted: 1.0))
|
||||
}
|
||||
|
||||
// MARK: - Generic generate
|
||||
|
||||
public func generate(
|
||||
prompt: String,
|
||||
instructions: String?,
|
||||
maxTokens _: Int
|
||||
) async -> String? {
|
||||
let trimmed = prompt.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
guard !trimmed.isEmpty else { return nil }
|
||||
#if canImport(FoundationModels)
|
||||
if #available(iOS 26.0, macOS 26.0, *) {
|
||||
return await runFoundationModelsGenerate(
|
||||
prompt: clip(trimmed),
|
||||
instructions: instructions
|
||||
)
|
||||
}
|
||||
return nil
|
||||
#else
|
||||
return nil
|
||||
#endif
|
||||
}
|
||||
|
||||
// MARK: - Summary (Memoro-optimierter Pfad mit @Generable)
|
||||
|
||||
public func summarize(transcript: String) async -> LLMSummary? {
|
||||
let trimmed = transcript.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
guard !trimmed.isEmpty else { return nil }
|
||||
#if canImport(FoundationModels)
|
||||
if #available(iOS 26.0, macOS 26.0, *) {
|
||||
return await runFoundationModelsSummary(transcript: clip(trimmed))
|
||||
}
|
||||
return nil
|
||||
#else
|
||||
return nil
|
||||
#endif
|
||||
}
|
||||
|
||||
/// Token-Window-Heuristik: ~4 chars / Token bei Deutsch, wir behalten
|
||||
/// ~3000 chars (~750 Tokens Prompt) damit Instructions + Response
|
||||
/// Platz haben.
|
||||
private func clip(_ text: String) -> String {
|
||||
let max = 3000
|
||||
guard text.count > max else { return text }
|
||||
return String(text.prefix(max))
|
||||
}
|
||||
|
||||
#if canImport(FoundationModels)
|
||||
@available(iOS 26.0, macOS 26.0, *)
|
||||
private func runFoundationModelsGenerate(
|
||||
prompt: String,
|
||||
instructions: String?
|
||||
) async -> String? {
|
||||
let session: LanguageModelSession
|
||||
if let instructions, !instructions.isEmpty {
|
||||
session = LanguageModelSession(instructions: Instructions(instructions))
|
||||
} else {
|
||||
session = LanguageModelSession()
|
||||
}
|
||||
do {
|
||||
let response = try await session.respond(to: Prompt(prompt))
|
||||
let text = response.content.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
LLMLog.backend.notice(
|
||||
"AppleFM generate OK (\(text.count, privacy: .public) chars)"
|
||||
)
|
||||
return text
|
||||
} catch {
|
||||
let message = String(describing: error)
|
||||
LLMLog.backend.error(
|
||||
"AppleFM generate failed: \(message, privacy: .public)"
|
||||
)
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
@available(iOS 26.0, macOS 26.0, *)
|
||||
private func runFoundationModelsSummary(transcript: String) async -> LLMSummary? {
|
||||
let instructions = Instructions(
|
||||
"Du bist ein deutscher Assistent, der gesprochene Sprachmemos kurz "
|
||||
+ "zusammenfasst. Antworte auf Deutsch, ohne Floskeln, ohne Anrede."
|
||||
)
|
||||
let session = LanguageModelSession(instructions: instructions)
|
||||
let prompt = Prompt(
|
||||
"Hier ist das Transkript einer Sprachmemo. Erzeuge eine prägnante "
|
||||
+ "Überschrift (maximal 80 Zeichen, kein Punkt am Ende, "
|
||||
+ "keine Anführungszeichen) und ein einleitendes Intro von 1–2 "
|
||||
+ "Sätzen, das den Kern der Memo wiedergibt. Antworte ausschließlich "
|
||||
+ "im geforderten Schema.\n\nTranskript:\n\(transcript)"
|
||||
)
|
||||
do {
|
||||
let response = try await session.respond(
|
||||
to: prompt,
|
||||
generating: GeneratedSummary.self
|
||||
)
|
||||
let summary = response.content
|
||||
let trimSet = CharacterSet(
|
||||
charactersIn: "\"\u{201E}\u{201C}\u{201D}.\u{00BB}\u{00AB}"
|
||||
)
|
||||
let cleanHeadline = summary.headline
|
||||
.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
.trimmingCharacters(in: trimSet)
|
||||
let cleanIntro = summary.intro
|
||||
.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
LLMLog.backend.notice(
|
||||
"AppleFM summary OK (headline=\(cleanHeadline.count, privacy: .public)c, intro=\(cleanIntro.count, privacy: .public)c)"
|
||||
)
|
||||
return LLMSummary(
|
||||
headline: String(cleanHeadline.prefix(80)),
|
||||
intro: cleanIntro
|
||||
)
|
||||
} catch {
|
||||
let message = String(describing: error)
|
||||
LLMLog.backend.error(
|
||||
"AppleFM summary failed: \(message, privacy: .public)"
|
||||
)
|
||||
return nil
|
||||
}
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
||||
#if canImport(FoundationModels)
|
||||
@available(iOS 26.0, macOS 26.0, *)
|
||||
@Generable
|
||||
private struct GeneratedSummary {
|
||||
@Guide(
|
||||
description: "Prägnante Überschrift auf Deutsch, maximal 80 Zeichen, ohne Punkt am Ende, ohne Anführungszeichen."
|
||||
)
|
||||
var headline: String
|
||||
|
||||
@Guide(description: "Einleitung in 1–2 deutschen Sätzen, die den Kern der Sprachmemo zusammenfasst.")
|
||||
var intro: String
|
||||
}
|
||||
#endif
|
||||
282
Sources/ManaLLM/GemmaBackend.swift
Normal file
282
Sources/ManaLLM/GemmaBackend.swift
Normal file
|
|
@ -0,0 +1,282 @@
|
|||
import Foundation
|
||||
import HuggingFace
|
||||
import ManaLLMShared
|
||||
import MLXHuggingFace
|
||||
import MLXLLM
|
||||
import MLXLMCommon
|
||||
import OSLog
|
||||
import Tokenizers
|
||||
|
||||
/// `LLMBackend` über MLX-Swift-LM mit einem Gemma-Modell aus dem
|
||||
/// HuggingFace `mlx-community/`-Namespace. Lädt das Modell beim
|
||||
/// ersten `prepare()` herunter und hält den `ModelContainer` für
|
||||
/// die App-Lifetime im Speicher.
|
||||
///
|
||||
/// **Cross-App-Sharing:** der HuggingFace-Cache lebt im
|
||||
/// `ManaSharedModels.effectiveCacheURL()` — bei korrekt
|
||||
/// konfiguriertem App-Group-Entitlement `group.ev.mana.models` ist
|
||||
/// das der gemeinsame Container, sonst der App-eigene Application-
|
||||
/// Support-Fallback. **Eine App lädt, alle anderen lesen**.
|
||||
///
|
||||
/// **Modell-Wahl (Mai 2026):** `gemma4_e2b_it_4bit` und
|
||||
/// `gemma4_e4b_it_4bit` aus `LLMRegistry`. Direkte Quellen auf HF:
|
||||
/// - mlx-community/gemma-4-e2b-it-4bit (~1.3 GB)
|
||||
/// - mlx-community/gemma-4-e4b-it-4bit (~2.5 GB)
|
||||
///
|
||||
/// **WiFi-only-Download:** Default. Apps können `allowsCellular: true`
|
||||
/// im Initializer übergeben, wenn der User explizit über Mobilfunk
|
||||
/// laden will.
|
||||
public actor GemmaBackend: LLMBackend {
|
||||
public enum Variant: Sendable {
|
||||
case e2b
|
||||
case e4b
|
||||
|
||||
var modelConfiguration: ModelConfiguration {
|
||||
switch self {
|
||||
case .e2b: LLMRegistry.gemma4_e2b_it_4bit
|
||||
case .e4b: LLMRegistry.gemma4_e4b_it_4bit
|
||||
}
|
||||
}
|
||||
|
||||
var estimatedBytes: Int64 {
|
||||
switch self {
|
||||
case .e2b: 3_614_000_000
|
||||
case .e4b: 5_250_000_000
|
||||
}
|
||||
}
|
||||
|
||||
var hfRepoFolderName: String {
|
||||
switch self {
|
||||
case .e2b: "models--mlx-community--gemma-4-e2b-it-4bit"
|
||||
case .e4b: "models--mlx-community--gemma-4-e4b-it-4bit"
|
||||
}
|
||||
}
|
||||
|
||||
var hfRepoID: String {
|
||||
switch self {
|
||||
case .e2b: "mlx-community/gemma-4-e2b-it-4bit"
|
||||
case .e4b: "mlx-community/gemma-4-e4b-it-4bit"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
public let identifier: LLMBackendID
|
||||
private let variant: Variant
|
||||
private let allowsCellular: Bool
|
||||
private var container: ModelContainer?
|
||||
|
||||
public init(variant: Variant, allowsCellular: Bool = false) {
|
||||
self.variant = variant
|
||||
self.allowsCellular = allowsCellular
|
||||
identifier = variant == .e2b ? .gemmaE2B : .gemmaE4B
|
||||
}
|
||||
|
||||
// MARK: - Availability
|
||||
|
||||
public func availability() async -> LLMAvailability {
|
||||
if container != nil { return .available }
|
||||
if isModelCached() { return .available }
|
||||
return .requiresDownload(estimatedBytes: variant.estimatedBytes)
|
||||
}
|
||||
|
||||
private func isModelCached() -> Bool {
|
||||
guard let cacheRoot = huggingFaceCacheRoot() else { return false }
|
||||
let repoDir = cacheRoot
|
||||
.appending(path: variant.hfRepoFolderName)
|
||||
.appending(path: "snapshots")
|
||||
guard FileManager.default.fileExists(atPath: repoDir.path()) else { return false }
|
||||
if let entries = try? FileManager.default.contentsOfDirectory(
|
||||
at: repoDir, includingPropertiesForKeys: nil
|
||||
) {
|
||||
for entry in entries {
|
||||
let cfg = entry.appending(path: "config.json")
|
||||
if FileManager.default.fileExists(atPath: cfg.path()) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
/// HF-Cache-Pfad. Priorität:
|
||||
/// 1. `HF_HUB_CACHE` env-Variable (z.B. via
|
||||
/// `ManaSharedModels.configureHuggingFaceCacheEnv()` im
|
||||
/// App-Boot gesetzt — Standard für mana-Apps).
|
||||
/// 2. `ManaSharedModels.effectiveCacheURL()` — App-Group-
|
||||
/// Container falls verfügbar, sonst App-eigener App-Support.
|
||||
private func huggingFaceCacheRoot() -> URL? {
|
||||
if let envCache = ProcessInfo.processInfo.environment["HF_HUB_CACHE"] {
|
||||
return URL(fileURLWithPath: envCache)
|
||||
}
|
||||
return ManaSharedModels.effectiveCacheURL()
|
||||
}
|
||||
|
||||
// MARK: - Prepare (Download + Init)
|
||||
|
||||
public func prepare(
|
||||
onProgress: @Sendable @escaping (LLMPrepareUpdate) -> Void
|
||||
) async throws {
|
||||
if container != nil {
|
||||
onProgress(LLMPrepareUpdate(stage: .ready, fractionCompleted: 1.0))
|
||||
return
|
||||
}
|
||||
onProgress(LLMPrepareUpdate(stage: .downloading, fractionCompleted: 0))
|
||||
let hub = makeHubClient()
|
||||
do {
|
||||
let loaded = try await LLMModelFactory.shared.loadContainer(
|
||||
from: #hubDownloader(hub),
|
||||
using: #huggingFaceTokenizerLoader(),
|
||||
configuration: variant.modelConfiguration
|
||||
) { progress in
|
||||
let total = progress.totalUnitCount
|
||||
let done = progress.completedUnitCount
|
||||
let fraction = progress.fractionCompleted
|
||||
LLMLog.download.debug(
|
||||
"Gemma progress: completed=\(done, privacy: .public)/\(total, privacy: .public) fraction=\(fraction, privacy: .public)"
|
||||
)
|
||||
onProgress(LLMPrepareUpdate(
|
||||
stage: .downloading,
|
||||
fractionCompleted: fraction,
|
||||
bytesCompleted: done > 0 ? done : nil,
|
||||
bytesTotal: total > 1 ? total : nil
|
||||
))
|
||||
}
|
||||
container = loaded
|
||||
onProgress(LLMPrepareUpdate(stage: .ready, fractionCompleted: 1.0))
|
||||
let name = variant.modelConfiguration.name
|
||||
LLMLog.backend.notice("GemmaBackend ready (\(name, privacy: .public))")
|
||||
} catch {
|
||||
let message = String(describing: error)
|
||||
LLMLog.backend.error("GemmaBackend prepare failed: \(message, privacy: .public)")
|
||||
throw error
|
||||
}
|
||||
}
|
||||
|
||||
private func makeHubClient() -> HubClient {
|
||||
let config = URLSessionConfiguration.default
|
||||
config.allowsCellularAccess = allowsCellular
|
||||
config.timeoutIntervalForRequest = 60
|
||||
config.timeoutIntervalForResource = 7200
|
||||
config.waitsForConnectivity = true
|
||||
let session = URLSession(configuration: config)
|
||||
|
||||
let cache: HubCache? = huggingFaceCacheRoot().map {
|
||||
HubCache(cacheDirectory: $0)
|
||||
}
|
||||
return HubClient(session: session, cache: cache)
|
||||
}
|
||||
|
||||
// MARK: - Generate
|
||||
|
||||
public func generate(
|
||||
prompt: String,
|
||||
instructions: String?,
|
||||
maxTokens _: Int
|
||||
) async -> String? {
|
||||
let trimmed = prompt.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
guard !trimmed.isEmpty else { return nil }
|
||||
guard let container else {
|
||||
LLMLog.backend.notice("GemmaBackend.generate called before prepare — returning nil")
|
||||
return nil
|
||||
}
|
||||
let session = ChatSession(
|
||||
container,
|
||||
instructions: instructions ?? ""
|
||||
)
|
||||
do {
|
||||
let response = try await session.respond(to: trimmed)
|
||||
let text = response.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
LLMLog.backend.notice(
|
||||
"Gemma generate OK (\(text.count, privacy: .public) chars)"
|
||||
)
|
||||
return text
|
||||
} catch {
|
||||
let message = String(describing: error)
|
||||
LLMLog.backend.error("Gemma generate failed: \(message, privacy: .public)")
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
// MARK: - Summary (Gemma-optimierter JSON-Path)
|
||||
|
||||
public func summarize(transcript: String) async -> LLMSummary? {
|
||||
let trimmed = transcript.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
guard !trimmed.isEmpty else { return nil }
|
||||
guard let container else {
|
||||
LLMLog.backend.notice("GemmaBackend.summarize called before prepare — returning nil")
|
||||
return nil
|
||||
}
|
||||
|
||||
// Gemma 4 hat 256 K Token-Window — wir klippen weicher als Apple FM
|
||||
// (8000 chars statt 3000). Map-Reduce über lange Inputs liegt im
|
||||
// Aufrufer-Pfad.
|
||||
let clipped = String(trimmed.prefix(8000))
|
||||
|
||||
let instructions = "Du bist ein deutscher Assistent, der gesprochene "
|
||||
+ "Sprachmemos kurz zusammenfasst. Antworte auf Deutsch, ohne Floskeln, "
|
||||
+ "ohne Anrede. Antworte ausschließlich als JSON-Objekt mit den "
|
||||
+ "Feldern \"headline\" (String, maximal 80 Zeichen, kein Punkt am Ende, "
|
||||
+ "keine Anführungszeichen) und \"intro\" (String, 1–2 Sätze). "
|
||||
+ "Keine zusätzlichen Felder, kein Markdown, keine Erklärungen."
|
||||
|
||||
let prompt = "Transkript:\n\(clipped)\n\nGib jetzt das JSON aus."
|
||||
|
||||
let session = ChatSession(container, instructions: instructions)
|
||||
do {
|
||||
let response = try await session.respond(to: prompt)
|
||||
return parseSummary(response)
|
||||
} catch {
|
||||
let message = String(describing: error)
|
||||
LLMLog.backend.error("GemmaBackend summarize failed: \(message, privacy: .public)")
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
/// Löscht das Modell aus dem HF-Cache. Achtung: in einem
|
||||
/// Shared-Container betrifft das ALLE teilnehmenden Apps.
|
||||
public func removeCachedModel() throws {
|
||||
container = nil
|
||||
try ManaSharedModels.removeModel(repo: variant.hfRepoID)
|
||||
LLMLog.backend.notice(
|
||||
"GemmaBackend removed cache for \(self.variant.hfRepoFolderName, privacy: .public)"
|
||||
)
|
||||
}
|
||||
|
||||
/// Extrahiert headline + intro aus einem Modell-Output. JSON
|
||||
/// bevorzugt, mit grober Heuristik als Fallback (kleine Modelle
|
||||
/// halten sich nicht immer ans Schema).
|
||||
private func parseSummary(_ raw: String) -> LLMSummary? {
|
||||
let trimmed = raw.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
if let jsonStart = trimmed.firstIndex(of: "{"),
|
||||
let jsonEnd = trimmed.lastIndex(of: "}")
|
||||
{
|
||||
let jsonString = String(trimmed[jsonStart ... jsonEnd])
|
||||
if let data = jsonString.data(using: .utf8),
|
||||
let obj = try? JSONSerialization.jsonObject(with: data) as? [String: Any],
|
||||
let headline = obj["headline"] as? String,
|
||||
let intro = obj["intro"] as? String
|
||||
{
|
||||
let trimSet = CharacterSet(
|
||||
charactersIn: "\"\u{201E}\u{201C}\u{201D}.\u{00BB}\u{00AB}"
|
||||
)
|
||||
let cleanHeadline = headline.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
.trimmingCharacters(in: trimSet)
|
||||
let cleanIntro = intro.trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
LLMLog.backend.notice(
|
||||
"GemmaBackend summary OK (json, headline=\(cleanHeadline.count, privacy: .public)c)"
|
||||
)
|
||||
return LLMSummary(
|
||||
headline: String(cleanHeadline.prefix(80)),
|
||||
intro: cleanIntro
|
||||
)
|
||||
}
|
||||
}
|
||||
LLMLog.backend.notice("GemmaBackend summary fallback (kein valides JSON)")
|
||||
let sentences = trimmed.split(separator: ".", maxSplits: 1, omittingEmptySubsequences: true)
|
||||
let headline = String(sentences.first ?? "").prefix(80)
|
||||
let intro = sentences.count > 1
|
||||
? String(sentences[1]).trimmingCharacters(in: .whitespacesAndNewlines)
|
||||
: ""
|
||||
return LLMSummary(headline: String(headline), intro: intro)
|
||||
}
|
||||
}
|
||||
186
Sources/ManaLLM/LLMBackend.swift
Normal file
186
Sources/ManaLLM/LLMBackend.swift
Normal file
|
|
@ -0,0 +1,186 @@
|
|||
import Foundation
|
||||
|
||||
/// Uniformes Protocol für alle LLM-Backends in `ManaLLM`.
|
||||
///
|
||||
/// Implementierungen:
|
||||
/// - `NoOpBackend` — kein LLM, erste-Sätze-Fallback.
|
||||
/// - `AppleFMBackend` — Apple Foundation Models (iOS 26+,
|
||||
/// Apple-Intelligence-Geräte).
|
||||
/// - `GemmaBackend` — Gemma 4 E2B/E4B via MLX-Swift, lokal
|
||||
/// heruntergeladen (oder aus `ManaSharedModels`-Container
|
||||
/// geladen, wenn die App in der `group.ev.mana.models`-Group ist).
|
||||
///
|
||||
/// **API-Design:** `generate(...)` ist die generische Methode für
|
||||
/// freie Prompts. `summarize(...)` ist eine Memoro-Erbe-Convenience
|
||||
/// und hat eine Default-Implementation auf Basis von `generate`,
|
||||
/// damit alle Backends sie automatisch unterstützen.
|
||||
///
|
||||
/// `LLMRouter` ist der typische Aufrufer und wählt das Backend nach
|
||||
/// Capability + Availability + App-Wunsch.
|
||||
public protocol LLMBackend: Sendable {
|
||||
var identifier: LLMBackendID { get }
|
||||
func availability() async -> LLMAvailability
|
||||
|
||||
/// Idempotenter Prepare-Schritt: System-Modell-Check (Apple FM),
|
||||
/// Modell-Download (Gemma), No-Op (NoOpBackend).
|
||||
func prepare(onProgress: @Sendable @escaping (LLMPrepareUpdate) -> Void) async throws
|
||||
|
||||
/// Generische Generation: nimmt einen Prompt, gibt einen String
|
||||
/// zurück. `instructions` ist optional ein vorangestellter
|
||||
/// System-Prompt (FoundationModels: `Instructions { ... }`,
|
||||
/// Gemma: vorne an den User-Prompt geheftet). `maxTokens`
|
||||
/// limitiert die Output-Länge (Backends können das kappen).
|
||||
///
|
||||
/// Niemals throw — bei Fehler `nil`. UI rendert dann
|
||||
/// "Backend nicht verfügbar".
|
||||
func generate(
|
||||
prompt: String,
|
||||
instructions: String?,
|
||||
maxTokens: Int
|
||||
) async -> String?
|
||||
|
||||
/// Memoro-Erbe-Convenience: liefert Headline + Intro für ein
|
||||
/// Transkript. Default-Impl ruft `generate` mit einem
|
||||
/// Standard-Summary-Prompt — Backends können das überschreiben
|
||||
/// für Modell-spezifische Optimierungen.
|
||||
func summarize(transcript: String) async -> LLMSummary?
|
||||
}
|
||||
|
||||
public extension LLMBackend {
|
||||
/// Default-Implementation für `summarize` auf Basis von
|
||||
/// `generate`. Backends mit optimiertem Summary-Pfad
|
||||
/// (z.B. AppleFMBackend mit FoundationModels-Schema) überschreiben.
|
||||
func summarize(transcript: String) async -> LLMSummary? {
|
||||
let instructions = """
|
||||
Du bist ein Assistent, der Audio-Transkripte in eine prägnante
|
||||
Headline und einen kurzen Intro-Satz auf Deutsch destilliert.
|
||||
Antworte im exakten Format:
|
||||
|
||||
HEADLINE: <max 60 Zeichen, prägnant, ohne Punkt>
|
||||
INTRO: <ein vollständiger Satz, max 200 Zeichen>
|
||||
"""
|
||||
let prompt = "Transkript:\n\n\(transcript)"
|
||||
guard let output = await generate(
|
||||
prompt: prompt,
|
||||
instructions: instructions,
|
||||
maxTokens: 200
|
||||
) else { return nil }
|
||||
return LLMSummary.parse(output)
|
||||
}
|
||||
}
|
||||
|
||||
/// Stabile IDs für `UserDefaults`-Persistenz (rawValue) und
|
||||
/// SwiftUI-Picker.
|
||||
public enum LLMBackendID: String, CaseIterable, Sendable {
|
||||
case noOp
|
||||
case appleFM
|
||||
case gemmaE2B
|
||||
case gemmaE4B
|
||||
|
||||
public var displayName: String {
|
||||
switch self {
|
||||
case .noOp: "Kein LLM (Fallback)"
|
||||
case .appleFM: "Apple Foundation Models (3 B)"
|
||||
case .gemmaE2B: "Gemma 4 E2B (2 B, ~1.3 GB)"
|
||||
case .gemmaE4B: "Gemma 4 E4B (4 B, ~2.5 GB)"
|
||||
}
|
||||
}
|
||||
|
||||
public var isOnDeviceLLM: Bool {
|
||||
self != .noOp
|
||||
}
|
||||
}
|
||||
|
||||
public struct LLMSummary: Equatable, Sendable {
|
||||
public let headline: String
|
||||
public let intro: String
|
||||
|
||||
public init(headline: String, intro: String) {
|
||||
self.headline = headline
|
||||
self.intro = intro
|
||||
}
|
||||
|
||||
/// Parst Default-Impl-Output ("HEADLINE: ...\nINTRO: ..."). Bei
|
||||
/// kaputtem Format `nil` (Aufrufer fällt auf `firstSentence`
|
||||
/// zurück).
|
||||
public static func parse(_ output: String) -> LLMSummary? {
|
||||
var headline: String?
|
||||
var intro: String?
|
||||
for rawLine in output.split(separator: "\n", omittingEmptySubsequences: true) {
|
||||
let line = rawLine.trimmingCharacters(in: .whitespaces)
|
||||
if line.uppercased().hasPrefix("HEADLINE:") {
|
||||
headline = line.dropPrefix(caseInsensitive: "HEADLINE:")
|
||||
.trimmingCharacters(in: .whitespaces)
|
||||
} else if line.uppercased().hasPrefix("INTRO:") {
|
||||
intro = line.dropPrefix(caseInsensitive: "INTRO:")
|
||||
.trimmingCharacters(in: .whitespaces)
|
||||
}
|
||||
}
|
||||
guard let h = headline, !h.isEmpty, let i = intro, !i.isEmpty else {
|
||||
return nil
|
||||
}
|
||||
return LLMSummary(headline: h, intro: i)
|
||||
}
|
||||
}
|
||||
|
||||
private extension String {
|
||||
func dropPrefix(caseInsensitive prefix: String) -> String {
|
||||
let lower = self.lowercased()
|
||||
let prefixLower = prefix.lowercased()
|
||||
guard lower.hasPrefix(prefixLower) else { return self }
|
||||
return String(self.dropFirst(prefix.count))
|
||||
}
|
||||
}
|
||||
|
||||
public enum LLMAvailability: Equatable, Sendable {
|
||||
case available
|
||||
case requiresDownload(estimatedBytes: Int64)
|
||||
case downloading(fractionCompleted: Double)
|
||||
case unavailableDeviceNotEligible
|
||||
case unavailableModelNotReady
|
||||
case unavailableAppleIntelligenceNotEnabled
|
||||
case unavailableOSTooOld
|
||||
case unavailableMissingDependency(String)
|
||||
case unknown(String)
|
||||
|
||||
/// Soll der Toggle in Settings auswählbar sein?
|
||||
public var isSelectable: Bool {
|
||||
switch self {
|
||||
case .available, .requiresDownload, .downloading:
|
||||
true
|
||||
default:
|
||||
false
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
public enum LLMPrepareStage: Equatable, Sendable {
|
||||
case checking
|
||||
case downloading
|
||||
case initializing
|
||||
case ready
|
||||
}
|
||||
|
||||
/// Reichhaltigeres Progress-Event als nur `Double`: enthält optional
|
||||
/// Byte-Werte. UI kann sinnvolle Anzeige machen auch wenn
|
||||
/// `fractionCompleted` zu Anfang noch 0 ist (passiert bei HF-LFS-
|
||||
/// Downloads bis das erste URLSession-Callback kommt). Bytes können
|
||||
/// `nil` sein, wenn die Quelle keine kennt (Apple FM, NoOp).
|
||||
public struct LLMPrepareUpdate: Equatable, Sendable {
|
||||
public let stage: LLMPrepareStage
|
||||
public let fractionCompleted: Double
|
||||
public let bytesCompleted: Int64?
|
||||
public let bytesTotal: Int64?
|
||||
|
||||
public init(
|
||||
stage: LLMPrepareStage,
|
||||
fractionCompleted: Double,
|
||||
bytesCompleted: Int64? = nil,
|
||||
bytesTotal: Int64? = nil
|
||||
) {
|
||||
self.stage = stage
|
||||
self.fractionCompleted = fractionCompleted
|
||||
self.bytesCompleted = bytesCompleted
|
||||
self.bytesTotal = bytesTotal
|
||||
}
|
||||
}
|
||||
11
Sources/ManaLLM/LLMLog.swift
Normal file
11
Sources/ManaLLM/LLMLog.swift
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
import Foundation
|
||||
import OSLog
|
||||
|
||||
/// OSLog-Namespaces für ManaLLM. App-übergreifend einheitlich unter
|
||||
/// dem Subsystem `ev.mana.llm` — Apps können das per Console.app /
|
||||
/// `log stream --predicate 'subsystem == "ev.mana.llm"'` mitlesen.
|
||||
enum LLMLog {
|
||||
static let backend = Logger(subsystem: "ev.mana.llm", category: "backend")
|
||||
static let router = Logger(subsystem: "ev.mana.llm", category: "router")
|
||||
static let download = Logger(subsystem: "ev.mana.llm", category: "download")
|
||||
}
|
||||
108
Sources/ManaLLM/LLMRouter.swift
Normal file
108
Sources/ManaLLM/LLMRouter.swift
Normal file
|
|
@ -0,0 +1,108 @@
|
|||
import Foundation
|
||||
import OSLog
|
||||
|
||||
/// Zentrale Drehscheibe: hält alle LLM-Backends, wählt nach App-
|
||||
/// Preference + Availability das passende und liefert eine
|
||||
/// einheitliche `generate`/`summarize`-API.
|
||||
///
|
||||
/// **Routing-Logik in `currentBackend()`:**
|
||||
///
|
||||
/// 1. App übergibt eine Priority-Liste (`preferredBackends`) — z.B.
|
||||
/// `[.appleFM, .gemmaE2B, .noOp]`.
|
||||
/// 2. Router fragt jedes Backend nach `availability()` und nimmt
|
||||
/// das erste mit `.isSelectable`-Status (verfügbar oder lokal
|
||||
/// gecacht).
|
||||
/// 3. Bei nichts verfügbar → `NoOpBackend`. UI rendert
|
||||
/// `availabilityMap()` separat, damit der User weiß warum.
|
||||
///
|
||||
/// Apps können das auch komplett umgehen und ein Backend direkt
|
||||
/// instanziieren — z.B. `await AppleFMBackend().generate(...)`.
|
||||
public actor LLMRouter {
|
||||
/// Bequemer App-übergreifender Default mit allen vier Backends.
|
||||
/// Apps mit weniger Backends überschreiben das.
|
||||
public static let shared = LLMRouter()
|
||||
|
||||
private let appleFM = AppleFMBackend()
|
||||
private let noOp = NoOpBackend()
|
||||
private let gemmaE2B = GemmaBackend(variant: .e2b)
|
||||
private let gemmaE4B = GemmaBackend(variant: .e4b)
|
||||
|
||||
/// Priority-Reihenfolge der Backends. Apps können das je nach
|
||||
/// Use-Case justieren — z.B. moodlit will Gemma E2B bevor es FM
|
||||
/// (Creative-Mapping), pageta will FM zuerst (Summary).
|
||||
private var preferred: [LLMBackendID]
|
||||
|
||||
public init(preferred: [LLMBackendID] = [.appleFM, .gemmaE2B, .gemmaE4B, .noOp]) {
|
||||
self.preferred = preferred
|
||||
}
|
||||
|
||||
public func setPreferred(_ ids: [LLMBackendID]) {
|
||||
preferred = ids
|
||||
}
|
||||
|
||||
public func backend(for id: LLMBackendID) -> LLMBackend {
|
||||
switch id {
|
||||
case .noOp: noOp
|
||||
case .appleFM: appleFM
|
||||
case .gemmaE2B: gemmaE2B
|
||||
case .gemmaE4B: gemmaE4B
|
||||
}
|
||||
}
|
||||
|
||||
/// Wählt das erste verfügbare Backend aus der Preference-Liste.
|
||||
/// Bevorzugt strikt `.available` (Modell bereit). Wenn kein
|
||||
/// `.available` gefunden → fällt auf `.requiresDownload`-Fall
|
||||
/// zurück, falls in der Liste. Letzte Notbremse: NoOp.
|
||||
public func currentBackend() async -> LLMBackend {
|
||||
// Erster Pass: `.available` only.
|
||||
for id in preferred {
|
||||
let candidate = backend(for: id)
|
||||
if await candidate.availability() == .available {
|
||||
return candidate
|
||||
}
|
||||
}
|
||||
// Zweiter Pass: irgendwas selectable (auch requires-Download).
|
||||
for id in preferred {
|
||||
let candidate = backend(for: id)
|
||||
let avail = await candidate.availability()
|
||||
if avail.isSelectable {
|
||||
LLMLog.router.notice(
|
||||
"Router: kein .available — pick \(id.rawValue, privacy: .public) (\(String(describing: avail), privacy: .public))"
|
||||
)
|
||||
return candidate
|
||||
}
|
||||
}
|
||||
LLMLog.router.notice("Router: keine verfügbaren Backends — NoOp")
|
||||
return noOp
|
||||
}
|
||||
|
||||
// MARK: - Convenience
|
||||
|
||||
public func generate(
|
||||
prompt: String,
|
||||
instructions: String? = nil,
|
||||
maxTokens: Int = 500
|
||||
) async -> String? {
|
||||
let backend = await currentBackend()
|
||||
return await backend.generate(
|
||||
prompt: prompt,
|
||||
instructions: instructions,
|
||||
maxTokens: maxTokens
|
||||
)
|
||||
}
|
||||
|
||||
public func summarize(transcript: String) async -> LLMSummary? {
|
||||
let backend = await currentBackend()
|
||||
return await backend.summarize(transcript: transcript)
|
||||
}
|
||||
|
||||
/// UI-helper: pro Backend-ID den Availability-Status, etwa für die
|
||||
/// Settings-Liste. Parallelisiert über die Backends.
|
||||
public func availabilityMap() async -> [LLMBackendID: LLMAvailability] {
|
||||
var result: [LLMBackendID: LLMAvailability] = [:]
|
||||
for id in LLMBackendID.allCases {
|
||||
result[id] = await backend(for: id).availability()
|
||||
}
|
||||
return result
|
||||
}
|
||||
}
|
||||
123
Sources/ManaLLM/ManaLLM.swift
Normal file
123
Sources/ManaLLM/ManaLLM.swift
Normal file
|
|
@ -0,0 +1,123 @@
|
|||
import Foundation
|
||||
import ManaLLMShared
|
||||
|
||||
/// High-Level-Facade für lokale LLM-Aufrufe in mana-Apps.
|
||||
///
|
||||
/// Apps konsumieren typischerweise nur diese drei statischen Methoden:
|
||||
///
|
||||
/// ```swift
|
||||
/// import ManaLLM
|
||||
///
|
||||
/// // Im App-Boot (z.B. @main App init):
|
||||
/// ManaLLM.configure()
|
||||
///
|
||||
/// // Irgendwo später:
|
||||
/// let summary = await ManaLLM.summarize(longText)
|
||||
/// let tags = await ManaLLM.classify(text, into: ["#sport", "#politik"])
|
||||
/// let story = await ManaLLM.generate(
|
||||
/// prompt: "Schreib eine kurze Reise-Story über Konstanz.",
|
||||
/// level: .creative
|
||||
/// )
|
||||
/// ```
|
||||
///
|
||||
/// **Level-Mapping zu Backends:**
|
||||
/// - `.fast` → AppleFM erst, dann Gemma E2B
|
||||
/// - `.creative` → Gemma E2B erst, dann AppleFM
|
||||
/// - `.deep` → Gemma E4B erst, dann Gemma E2B, dann AppleFM
|
||||
///
|
||||
/// Niemals throw — bei Fehler `nil` (oder leeres Set). Apps rendern
|
||||
/// dann eine Fallback-Heuristik.
|
||||
public enum ManaLLM {
|
||||
/// Zentraler Router mit Default-Backend-Priority. Apps können
|
||||
/// das vor dem ersten Call konfigurieren:
|
||||
/// ```swift
|
||||
/// await ManaLLM.router.setPreferred([.gemmaE2B, .appleFM])
|
||||
/// ```
|
||||
public static let router = LLMRouter.shared
|
||||
|
||||
/// Boot-Side-Effects: HF_HUB_CACHE auf den Shared-Container
|
||||
/// setzen. Möglichst früh aufrufen (z.B. im `@main`-`init()`).
|
||||
@discardableResult
|
||||
public static func configure() -> URL? {
|
||||
ManaSharedModels.configureHuggingFaceCacheEnv()
|
||||
}
|
||||
|
||||
/// Quality-Level für Routing.
|
||||
public enum Level: Sendable {
|
||||
case fast // AppleFM zuerst — Standard-Tasks
|
||||
case creative // Gemma E2B zuerst — Story/Mood/Caption
|
||||
case deep // Gemma E4B zuerst — Long-Context/Q&A
|
||||
}
|
||||
|
||||
// MARK: - High-Level Operations
|
||||
|
||||
/// Freie Generation mit optionalem System-Prompt.
|
||||
public static func generate(
|
||||
prompt: String,
|
||||
instructions: String? = nil,
|
||||
level: Level = .fast,
|
||||
maxTokens: Int = 500
|
||||
) async -> String? {
|
||||
let preferred = backendPriority(for: level)
|
||||
let router = LLMRouter(preferred: preferred)
|
||||
return await router.generate(
|
||||
prompt: prompt,
|
||||
instructions: instructions,
|
||||
maxTokens: maxTokens
|
||||
)
|
||||
}
|
||||
|
||||
/// Memoro-Erbe: Headline + Intro für ein langes Transkript.
|
||||
public static func summarize(
|
||||
_ text: String,
|
||||
level: Level = .fast
|
||||
) async -> LLMSummary? {
|
||||
let preferred = backendPriority(for: level)
|
||||
let router = LLMRouter(preferred: preferred)
|
||||
return await router.summarize(transcript: text)
|
||||
}
|
||||
|
||||
/// Klassifikation in vordefinierte Labels. Returnt die Subset-
|
||||
/// Labels, die laut LLM passen. Bei Parse-Fehler: leeres Set.
|
||||
public static func classify(
|
||||
_ text: String,
|
||||
into labels: [String],
|
||||
level: Level = .fast
|
||||
) async -> Set<String> {
|
||||
guard !labels.isEmpty else { return [] }
|
||||
let labelList = labels.joined(separator: ", ")
|
||||
let instructions = """
|
||||
Du bist ein Klassifikator. Gegeben ein Text, wähle aus der Label-
|
||||
Liste GENAU die Labels, die zum Text passen. Antworte
|
||||
ausschließlich mit den passenden Labels, durch Komma getrennt,
|
||||
ohne Erklärung, ohne Markdown.
|
||||
|
||||
Labels: \(labelList)
|
||||
"""
|
||||
guard let output = await generate(
|
||||
prompt: "Text:\n\n\(text)",
|
||||
instructions: instructions,
|
||||
level: level,
|
||||
maxTokens: 100
|
||||
) else { return [] }
|
||||
let valid = Set(labels)
|
||||
let picked = output
|
||||
.split(separator: ",")
|
||||
.map { $0.trimmingCharacters(in: .whitespacesAndNewlines) }
|
||||
.filter { valid.contains($0) }
|
||||
return Set(picked)
|
||||
}
|
||||
|
||||
// MARK: - Internal
|
||||
|
||||
private static func backendPriority(for level: Level) -> [LLMBackendID] {
|
||||
switch level {
|
||||
case .fast:
|
||||
return [.appleFM, .gemmaE2B, .gemmaE4B, .noOp]
|
||||
case .creative:
|
||||
return [.gemmaE2B, .appleFM, .gemmaE4B, .noOp]
|
||||
case .deep:
|
||||
return [.gemmaE4B, .gemmaE2B, .appleFM, .noOp]
|
||||
}
|
||||
}
|
||||
}
|
||||
37
Sources/ManaLLM/NoOpBackend.swift
Normal file
37
Sources/ManaLLM/NoOpBackend.swift
Normal file
|
|
@ -0,0 +1,37 @@
|
|||
import Foundation
|
||||
|
||||
/// Fallback-Backend ohne LLM. Returnt immer `nil` aus `generate`
|
||||
/// und `summarize` — der Aufrufer (`LLMRouter` oder Apps direkt)
|
||||
/// fängt das und nutzt eine eigene Heuristik (erste Sätze,
|
||||
/// statisches Template, ...).
|
||||
///
|
||||
/// Existiert als 1st-class-Type, damit Settings einen deterministischen
|
||||
/// Picker-Eintrag haben können und das Routing nicht in Optional-
|
||||
/// Logik versinkt.
|
||||
public actor NoOpBackend: LLMBackend {
|
||||
public let identifier: LLMBackendID = .noOp
|
||||
|
||||
public init() {}
|
||||
|
||||
public func availability() async -> LLMAvailability {
|
||||
.available
|
||||
}
|
||||
|
||||
public func prepare(
|
||||
onProgress: @Sendable @escaping (LLMPrepareUpdate) -> Void
|
||||
) async throws {
|
||||
onProgress(LLMPrepareUpdate(stage: .ready, fractionCompleted: 1.0))
|
||||
}
|
||||
|
||||
public func generate(
|
||||
prompt _: String,
|
||||
instructions _: String?,
|
||||
maxTokens _: Int
|
||||
) async -> String? {
|
||||
nil
|
||||
}
|
||||
|
||||
public func summarize(transcript _: String) async -> LLMSummary? {
|
||||
nil
|
||||
}
|
||||
}
|
||||
148
Sources/ManaLLMShared/ManaSharedModels.swift
Normal file
148
Sources/ManaLLMShared/ManaSharedModels.swift
Normal file
|
|
@ -0,0 +1,148 @@
|
|||
import Foundation
|
||||
|
||||
/// Container für HuggingFace-Cache, geteilt über alle mana-e.V.-
|
||||
/// Apps via App-Group `group.ev.mana.models`. Apps mit dieser
|
||||
/// Group im Entitlement lesen Modelle aus demselben Pfad → kein
|
||||
/// Doppel-Download.
|
||||
///
|
||||
/// **Setup pro App:**
|
||||
///
|
||||
/// 1. Apple-Dev-Portal: App ID öffnen → App Groups Capability →
|
||||
/// `group.ev.mana.models` hinzufügen.
|
||||
/// 2. `project.yml` Entitlement:
|
||||
/// ```yaml
|
||||
/// com.apple.security.application-groups:
|
||||
/// - group.ev.mana.<app>
|
||||
/// - group.ev.mana.models
|
||||
/// ```
|
||||
/// 3. App-Code beim Boot:
|
||||
/// ```swift
|
||||
/// ManaSharedModels.configureHuggingFaceCacheEnv()
|
||||
/// ```
|
||||
/// Damit zeigt MLX-Swift's HuggingFace-Hub-Client beim ersten
|
||||
/// `Hub.snapshot(...)` automatisch in den Shared-Container.
|
||||
///
|
||||
/// SOT-Doku: `mana/docs/MANA_LLM.md`.
|
||||
public enum ManaSharedModels {
|
||||
/// Kanonische App-Group für gemeinsamen Modell-Container.
|
||||
public static let appGroup = "group.ev.mana.models"
|
||||
|
||||
/// Pfad-Convention im Container, kompatibel zu HuggingFace-Hub-
|
||||
/// Default-Layout (`~/.cache/huggingface/hub/...`).
|
||||
public static let hubSubdirectory = "huggingface/hub"
|
||||
|
||||
/// URL des HuggingFace-Cache-Roots im Shared-Container.
|
||||
///
|
||||
/// Returns `nil`, wenn:
|
||||
/// - die App das `group.ev.mana.models`-Entitlement nicht hat,
|
||||
/// - die App in einem Modus läuft, in dem App-Group-Container
|
||||
/// nicht zugänglich sind (z.B. bestimmte Extension-Kontexte).
|
||||
///
|
||||
/// Caller fallen dann auf App-eigenen `Application Support`
|
||||
/// zurück — `legacyCacheURL()` liefert das.
|
||||
public static func cacheURL() -> URL? {
|
||||
guard let container = FileManager.default.containerURL(
|
||||
forSecurityApplicationGroupIdentifier: appGroup
|
||||
) else {
|
||||
return nil
|
||||
}
|
||||
let hub = container.appending(path: hubSubdirectory)
|
||||
// Verzeichnis anlegen + von iCloud-Backup ausschließen.
|
||||
try? FileManager.default.createDirectory(at: hub, withIntermediateDirectories: true)
|
||||
var hubVar = hub
|
||||
var values = URLResourceValues()
|
||||
values.isExcludedFromBackup = true
|
||||
try? hubVar.setResourceValues(values)
|
||||
return hub
|
||||
}
|
||||
|
||||
/// Fallback wenn `cacheURL()` `nil` ist: App-eigener
|
||||
/// `Application Support/huggingface/hub`. Pendant zum bisherigen
|
||||
/// memoro-spezifischen Pfad — kein Sharing, aber funktional.
|
||||
public static func legacyCacheURL() -> URL? {
|
||||
guard let appSupport = try? FileManager.default.url(
|
||||
for: .applicationSupportDirectory,
|
||||
in: .userDomainMask,
|
||||
appropriateFor: nil,
|
||||
create: true
|
||||
) else {
|
||||
return nil
|
||||
}
|
||||
let hub = appSupport.appending(path: hubSubdirectory)
|
||||
try? FileManager.default.createDirectory(at: hub, withIntermediateDirectories: true)
|
||||
var hubVar = hub
|
||||
var values = URLResourceValues()
|
||||
values.isExcludedFromBackup = true
|
||||
try? hubVar.setResourceValues(values)
|
||||
return hub
|
||||
}
|
||||
|
||||
/// Bevorzugter Cache-Pfad: shared-Container falls verfügbar,
|
||||
/// sonst Legacy-App-eigener Application Support.
|
||||
public static func effectiveCacheURL() -> URL? {
|
||||
cacheURL() ?? legacyCacheURL()
|
||||
}
|
||||
|
||||
/// Setzt die `HF_HUB_CACHE`-Environment-Variable auf den shared
|
||||
/// Container. MLX-Swift's `HubClient` und swift-huggingface's
|
||||
/// `Hub.snapshot(...)` lesen diese Variable beim Boot.
|
||||
///
|
||||
/// Idempotent. Wenn der Shared-Container nicht zugänglich ist,
|
||||
/// wird die Variable auf den Legacy-Pfad gesetzt — Apps müssen
|
||||
/// keinen Fallback-Code schreiben.
|
||||
///
|
||||
/// **Wichtig:** Diese Funktion möglichst früh im App-Boot
|
||||
/// aufrufen (vor dem ersten LLM-Call), z.B. im
|
||||
/// `init()` der `@main`-App-Struct.
|
||||
@discardableResult
|
||||
public static func configureHuggingFaceCacheEnv() -> URL? {
|
||||
guard let url = effectiveCacheURL() else { return nil }
|
||||
setenv("HF_HUB_CACHE", url.path, 1)
|
||||
return url
|
||||
}
|
||||
|
||||
/// Liefert URL eines konkreten Modell-Repo-Verzeichnisses im
|
||||
/// Cache. Konvenientes Pendant zum HuggingFace-Pfad-Schema:
|
||||
/// `<hub>/models--<owner>--<name>`.
|
||||
///
|
||||
/// Beispiel:
|
||||
/// ```swift
|
||||
/// ManaSharedModels.modelDirURL(repo: "mlx-community/gemma-4-e2b-it-4bit")
|
||||
/// // → <hub>/models--mlx-community--gemma-4-e2b-it-4bit
|
||||
/// ```
|
||||
public static func modelDirURL(repo: String) -> URL? {
|
||||
guard let hub = effectiveCacheURL() else { return nil }
|
||||
let dirName = "models--" + repo.replacingOccurrences(of: "/", with: "--")
|
||||
return hub.appending(path: dirName)
|
||||
}
|
||||
|
||||
/// Best-effort-Größenschätzung des Cache-Inhalts in Bytes.
|
||||
/// Settings-Views können das nutzen, um "Lokale Modelle: 3.8 GB"
|
||||
/// anzuzeigen.
|
||||
public static func cacheSizeBytes() -> Int64 {
|
||||
guard let hub = effectiveCacheURL() else { return 0 }
|
||||
guard let enumerator = FileManager.default.enumerator(
|
||||
at: hub,
|
||||
includingPropertiesForKeys: [.totalFileAllocatedSizeKey, .isRegularFileKey]
|
||||
) else {
|
||||
return 0
|
||||
}
|
||||
var total: Int64 = 0
|
||||
for case let url as URL in enumerator {
|
||||
let values = try? url.resourceValues(forKeys: [.totalFileAllocatedSizeKey, .isRegularFileKey])
|
||||
if values?.isRegularFile == true, let size = values?.totalFileAllocatedSize {
|
||||
total += Int64(size)
|
||||
}
|
||||
}
|
||||
return total
|
||||
}
|
||||
|
||||
/// Löscht ein konkretes Modell-Repo aus dem Shared-Cache. Achtung:
|
||||
/// betrifft alle teilnehmenden Apps. UIs sollten das explizit
|
||||
/// kommunizieren ("Modelle für alle mana-Apps entfernen").
|
||||
public static func removeModel(repo: String) throws {
|
||||
guard let dir = modelDirURL(repo: repo) else { return }
|
||||
guard FileManager.default.fileExists(atPath: dir.path()) else { return }
|
||||
try FileManager.default.removeItem(at: dir)
|
||||
}
|
||||
}
|
||||
36
Tests/ManaLLMTests/LLMSummaryParserTests.swift
Normal file
36
Tests/ManaLLMTests/LLMSummaryParserTests.swift
Normal file
|
|
@ -0,0 +1,36 @@
|
|||
import Testing
|
||||
@testable import ManaLLM
|
||||
|
||||
/// Smoke-Tests für den Default-Summary-Parser. Backend-spezifische
|
||||
/// Tests (Apple FM, Gemma) verlangen echte Modelle und laufen in
|
||||
/// memoro-native als Integration-Smoke.
|
||||
struct LLMSummaryParserTests {
|
||||
@Test func parsesValidOutput() throws {
|
||||
let output = """
|
||||
HEADLINE: Spaziergang am Bodensee
|
||||
INTRO: Heute ein langer Spaziergang am Konstanzer Ufer mit guten Gedanken.
|
||||
"""
|
||||
let summary = try #require(LLMSummary.parse(output))
|
||||
#expect(summary.headline == "Spaziergang am Bodensee")
|
||||
#expect(summary.intro.hasPrefix("Heute ein langer Spaziergang"))
|
||||
}
|
||||
|
||||
@Test func parsesLowercaseLabels() throws {
|
||||
let output = """
|
||||
headline: Notiz
|
||||
intro: Kurze Notiz für später.
|
||||
"""
|
||||
let summary = try #require(LLMSummary.parse(output))
|
||||
#expect(summary.headline == "Notiz")
|
||||
}
|
||||
|
||||
@Test func returnsNilOnMalformedOutput() {
|
||||
let output = "Da ist nur ein Satz ohne Struktur."
|
||||
#expect(LLMSummary.parse(output) == nil)
|
||||
}
|
||||
|
||||
@Test func returnsNilWhenIntroMissing() {
|
||||
let output = "HEADLINE: Nur Headline"
|
||||
#expect(LLMSummary.parse(output) == nil)
|
||||
}
|
||||
}
|
||||
Loading…
Add table
Add a link
Reference in a new issue