managarten/packages/local-stt/src/models.ts
Till JS 3deee755b3 feat(web): PillNav bar mode, fullscreen, local STT + mic button
PillNav overhaul:
- Dropdown-as-bar: theme/AI/sync/user menus render as horizontal
  bars in the bottom stack (PillDropdownBar) instead of floating
  popovers. New onOpenBar/activeBarId props on PillNavigation.
- iconOnly pills: tags/search/workbench-tabs pills show only icons.
  Home pill removed. New iconOnly flag on PillNavItem.
- Segmented toggle groups: items sharing a `group` id render as a
  single segmented pill (e.g. Light/Dark/System triple).
- Fullscreen mode: press "f" to hide all bottom chrome, Esc to exit.
- QuickInputBar + bottom bar visibility toggles via new pills.
- Progress ring on AI trigger pill during model download
  (conic-gradient ::after, follows pill border-radius).

@mana/local-stt — new package for browser-local speech-to-text:
- Whisper models via transformers.js v4 (WebGPU + WASM fallback)
- Same Web Worker architecture as @mana/local-llm
- Two models: Whisper Tiny (150 MB) and Whisper Small (950 MB)
- Reactive Svelte 5 bindings (getLocalSttStatus, loadLocalStt, transcribe)

Voice-to-text integration:
- useLocalStt() composable: mic capture via AudioContext +
  ScriptProcessor, resample to 16kHz mono, feed into Whisper worker
- Mic button in QuickInputBar (leftAction slot) with
  recording/loading/transcribing states + pulse animation
- Transcribed text injected into InputBar via new injectedText prop
- STT model selector in AI bar alongside LLM tier controls

Also: vite.config.ts server.fs.allow expanded to monorepo root
so workspace package workers resolve in dev.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:05:43 +02:00

38 lines
1.3 KiB
TypeScript

import type { SttModelConfig } from './types';
/**
* Pre-configured Whisper models for client-side speech-to-text.
*
* All models are ONNX builds loaded via @huggingface/transformers (transformers.js)
* with the WebGPU backend. English-only variants are smaller and faster for
* single-language use; multilingual models auto-detect the spoken language.
*
* Model quality/size trade-off (English WER on LibriSpeech test-clean):
* tiny.en: ~5.6% — 39M params, very fast, good enough for dictation
* base.en: ~4.3% — 74M params, noticeably better on accents/noise
* small.en: ~3.4% — 244M params, near-human accuracy, slower
* tiny: ~7.6% — multilingual, auto-detects language
* base: ~5.0% — multilingual
* small: ~3.9% — multilingual
*/
export const MODELS = {
'whisper-tiny': {
modelId: 'onnx-community/whisper-tiny',
displayName: 'Whisper Tiny',
dtype: 'fp32',
downloadSizeMb: 150,
ramUsageMb: 300,
},
'whisper-small': {
modelId: 'onnx-community/whisper-small',
displayName: 'Whisper Small',
dtype: 'fp32',
downloadSizeMb: 950,
ramUsageMb: 1500,
},
} as const satisfies Record<string, SttModelConfig>;
export type ModelKey = keyof typeof MODELS;
export const DEFAULT_MODEL: ModelKey = 'whisper-tiny';