mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-18 19:01:23 +02:00
Integrate new transcriber application for AI-powered YouTube video
transcription with full monorepo structure and Groq Whisper API support.
## App Structure
- apps/transcriber/apps/backend - NestJS API server (port 3006)
- apps/transcriber/apps/web - SvelteKit web application
- apps/transcriber/apps/landing - Astro marketing/content site
- apps/transcriber/apps/mobile - Expo React Native app
- apps/transcriber/packages/shared-types - Shared TypeScript types
## Backend Features
- YouTube video download via yt-dlp (child_process)
- Ultra-fast transcription via Groq Whisper API (~300x realtime)
- Fallback to local Whisper for offline use
- Job queue with background processing
- Real-time progress updates via WebSocket (Socket.io)
- Playlist management for batch processing
- Health check endpoints
## API Endpoints
- POST /transcription - Start transcription job
- GET /transcription - List all jobs
- GET /transcription/:id - Get job status
- DELETE /transcription/:id - Cancel job
- GET /transcription/stats - Statistics
- GET /whisper/models - Available models
- GET/POST/DELETE /playlist - Playlist management
- GET /health - Health checks
## Whisper Models
- Groq: whisper-large-v3-turbo (fast, $0.04/hr)
- Groq: whisper-large-v3 (accurate, $0.111/hr)
- Local: tiny, base, small, medium, large
## Monorepo Integration
- Added to pnpm workspace via apps/*/apps/* pattern
- Root scripts: transcriber:dev, dev:transcriber:*
- Package naming: @transcriber/{backend,web,landing,mobile}
- Turbo tasks: dev, build, lint, type-check
- CLAUDE.md documentation
## Technology Stack
- Backend: NestJS 10, TypeScript, Socket.io
- Web: SvelteKit 2, Svelte 5, Tailwind CSS
- Landing: Astro 4, Solid.js, Tailwind CSS
- Mobile: Expo 52, React Native, NativeWind, Zustand
- Transcription: Groq Whisper API (OpenAI-compatible)
## Migration from Python
- Original Python/FastAPI code preserved in legacy/
- Full rewrite to TypeScript/NestJS
- Same functionality with improved architecture
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
5.6 KiB
5.6 KiB
CLAUDE.md - Transcriber
This file provides guidance to Claude Code when working with the Transcriber project.
Project Overview
Transcriber is an AI-powered YouTube video transcription application with:
- YouTube video download via yt-dlp
- Ultra-fast audio transcription using Groq Whisper API (~300x realtime)
- Fallback to local Whisper for offline use
- Playlist management for batch processing
- Real-time progress updates via WebSocket
- Multi-platform support (Web, Mobile, Landing)
Architecture
apps/transcriber/
├── apps/
│ ├── backend/ # NestJS API server (port 3006)
│ ├── web/ # SvelteKit web application
│ ├── landing/ # Astro landing/content site
│ └── mobile/ # Expo React Native app
├── packages/
│ └── shared-types/ # Shared TypeScript types
├── data/ # Transcripts & playlists (gitignored)
├── legacy/ # Original Python code (reference)
├── package.json # Root orchestrator
└── CLAUDE.md # This file
Quick Start
Prerequisites
- Node.js 20+
- pnpm 9.15.0+
- yt-dlp installed (
brew install yt-dlpon macOS) - For local Whisper: Python 3 with openai-whisper package
Development
# From monorepo root
pnpm install
# Start all transcriber apps
pnpm transcriber:dev
# Start individual apps
pnpm dev:transcriber:backend # NestJS backend (port 3006)
pnpm dev:transcriber:web # SvelteKit web (port 5173)
pnpm dev:transcriber:landing # Astro landing (port 4321)
pnpm dev:transcriber:mobile # Expo mobile
# Start web + backend together
pnpm dev:transcriber:app
Environment Variables
Create apps/transcriber/apps/backend/.env:
PORT=3006
WHISPER_PROVIDER=groq # groq or local
WHISPER_MODEL=whisper-large-v3-turbo # whisper-large-v3-turbo, whisper-large-v3 (groq) | tiny, base, small, medium, large (local)
GROQ_API_KEY=gsk_... # Required for Groq provider
TEMP_AUDIO_DIR=./temp_audio
TRANSCRIPTS_DIR=./data/transcripts
PLAYLISTS_DIR=./data/playlists
API Endpoints
Transcription
| Method | Endpoint | Description |
|---|---|---|
| POST | /transcription |
Start new transcription job |
| GET | /transcription |
List all jobs |
| GET | /transcription/:id |
Get job status |
| DELETE | /transcription/:id |
Cancel job |
| GET | /transcription/stats |
Get statistics |
Playlists
| Method | Endpoint | Description |
|---|---|---|
| GET | /playlist |
List all playlists |
| GET | /playlist/:category/:name |
Get specific playlist |
| POST | /playlist |
Create playlist |
| DELETE | /playlist/:category/:name |
Delete playlist |
Whisper
| Method | Endpoint | Description |
|---|---|---|
| GET | /whisper/models |
Get available models |
Health
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check |
| GET | /health/ready |
Readiness check |
| GET | /health/live |
Liveness check |
WebSocket
Connect to /progress namespace for real-time updates:
const socket = io('http://localhost:3006/progress');
socket.on('job_update', (data) => {
// { type, jobId, status, progress, videoInfo }
});
socket.on('job_complete', (data) => {
// { type, jobId, status, transcriptPath }
});
socket.on('job_error', (data) => {
// { type, jobId, error }
});
Whisper Configuration
Groq Whisper API (Recommended)
- Ultra-fast, cloud-based (~300x realtime speed)
- Cost: ~$0.04/hour (whisper-large-v3-turbo) or ~$0.111/hour (whisper-large-v3)
- No GPU required
- Models:
whisper-large-v3-turbo(fast) orwhisper-large-v3(accurate) - Set
WHISPER_PROVIDER=groqandGROQ_API_KEY
Local Whisper
- Free, runs locally
- Requires Python + openai-whisper
- GPU recommended for larger models
- Models:
tiny,base,small,medium,large - Set
WHISPER_PROVIDER=localandWHISPER_MODEL
Technology Stack
| Component | Technology |
|---|---|
| Backend | NestJS 10, TypeScript |
| Web | SvelteKit 2, Svelte 5, Tailwind |
| Landing | Astro 4, Tailwind |
| Mobile | Expo 52, React Native, NativeWind |
| YouTube | yt-dlp (via child_process) |
| Transcription | Groq Whisper API / local Whisper |
| Real-time | Socket.io |
| State (Mobile) | Zustand |
Code Patterns
Backend Services
@Injectable()
export class TranscriptionService {
async createJob(dto: TranscribeRequestDto): Promise<TranscriptionJob> {
// Background processing with WebSocket updates
}
}
Web (Svelte 5 Runes)
// Correct - Svelte 5
let jobs = $state<Job[]>([]);
let activeJobs = $derived(jobs.filter(j => j.status === 'active'));
// Wrong - Old Svelte syntax
let jobs = [];
$: activeJobs = jobs.filter(j => j.status === 'active');
Mobile (Zustand)
export const useJobStore = create<JobStore>((set) => ({
jobs: [],
addJob: (job) => set((state) => ({ jobs: [...state.jobs, job] })),
}));
Legacy Python Code
The original Python implementation is preserved in legacy/ for reference:
transcriber_v4_parallel.py- Main transcription logicapi_server.py- FastAPI server (replaced by NestJS)requirements.txt- Python dependencies
Troubleshooting
yt-dlp not found
# macOS
brew install yt-dlp
# Linux
pip install yt-dlp
Local Whisper not working
# Install Whisper
pip install openai-whisper
# Test
python3 -c "import whisper; print(whisper.available_models())"
Backend can't start
# Check port 3006
lsof -i :3006 && kill -9 $(lsof -t -i:3006)
# Check environment
cat apps/backend/.env