mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-15 23:19:40 +02:00
PillNav overhaul: - Dropdown-as-bar: theme/AI/sync/user menus render as horizontal bars in the bottom stack (PillDropdownBar) instead of floating popovers. New onOpenBar/activeBarId props on PillNavigation. - iconOnly pills: tags/search/workbench-tabs pills show only icons. Home pill removed. New iconOnly flag on PillNavItem. - Segmented toggle groups: items sharing a `group` id render as a single segmented pill (e.g. Light/Dark/System triple). - Fullscreen mode: press "f" to hide all bottom chrome, Esc to exit. - QuickInputBar + bottom bar visibility toggles via new pills. - Progress ring on AI trigger pill during model download (conic-gradient ::after, follows pill border-radius). @mana/local-stt — new package for browser-local speech-to-text: - Whisper models via transformers.js v4 (WebGPU + WASM fallback) - Same Web Worker architecture as @mana/local-llm - Two models: Whisper Tiny (150 MB) and Whisper Small (950 MB) - Reactive Svelte 5 bindings (getLocalSttStatus, loadLocalStt, transcribe) Voice-to-text integration: - useLocalStt() composable: mic capture via AudioContext + ScriptProcessor, resample to 16kHz mono, feed into Whisper worker - Mic button in QuickInputBar (leftAction slot) with recording/loading/transcribing states + pulse animation - Transcribed text injected into InputBar via new injectedText prop - STT model selector in AI bar alongside LLM tier controls Also: vite.config.ts server.fs.allow expanded to monorepo root so workspace package workers resolve in dev. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
38 lines
1.3 KiB
TypeScript
38 lines
1.3 KiB
TypeScript
import type { SttModelConfig } from './types';
|
|
|
|
/**
|
|
* Pre-configured Whisper models for client-side speech-to-text.
|
|
*
|
|
* All models are ONNX builds loaded via @huggingface/transformers (transformers.js)
|
|
* with the WebGPU backend. English-only variants are smaller and faster for
|
|
* single-language use; multilingual models auto-detect the spoken language.
|
|
*
|
|
* Model quality/size trade-off (English WER on LibriSpeech test-clean):
|
|
* tiny.en: ~5.6% — 39M params, very fast, good enough for dictation
|
|
* base.en: ~4.3% — 74M params, noticeably better on accents/noise
|
|
* small.en: ~3.4% — 244M params, near-human accuracy, slower
|
|
* tiny: ~7.6% — multilingual, auto-detects language
|
|
* base: ~5.0% — multilingual
|
|
* small: ~3.9% — multilingual
|
|
*/
|
|
|
|
export const MODELS = {
|
|
'whisper-tiny': {
|
|
modelId: 'onnx-community/whisper-tiny',
|
|
displayName: 'Whisper Tiny',
|
|
dtype: 'fp32',
|
|
downloadSizeMb: 150,
|
|
ramUsageMb: 300,
|
|
},
|
|
'whisper-small': {
|
|
modelId: 'onnx-community/whisper-small',
|
|
displayName: 'Whisper Small',
|
|
dtype: 'fp32',
|
|
downloadSizeMb: 950,
|
|
ramUsageMb: 1500,
|
|
},
|
|
} as const satisfies Record<string, SttModelConfig>;
|
|
|
|
export type ModelKey = keyof typeof MODELS;
|
|
|
|
export const DEFAULT_MODEL: ModelKey = 'whisper-tiny';
|