managarten/docs/DATABASE_MIGRATIONS.md
Till-JS 1d44f918c5 fix(manacore-web): add missing packages to Dockerfile
Add shared-pwa, qr-export, and wallpaper-generator packages
to the Docker build context for manacore-web.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-17 13:43:08 +01:00

667 lines
20 KiB
Markdown

# Database Migration Guide
This document describes database migration best practices, procedures, and tooling for the ManaCore monorepo. **This is a core system concept** - all developers should understand these patterns.
## Table of Contents
1. [Overview](#overview)
2. [Drizzle Migration Internals](#drizzle-migration-internals)
3. [Migration Commands](#migration-commands)
4. [Development vs Production](#development-vs-production)
5. [CI/CD Pipeline](#cicd-pipeline)
6. [Advisory Locks](#advisory-locks)
7. [Zero-Downtime Migrations](#zero-downtime-migrations)
8. [Rollback Procedures](#rollback-procedures)
9. [Troubleshooting](#troubleshooting)
---
## Overview
All backends in the ManaCore monorepo use **Drizzle ORM** for database schema management. We use two different approaches depending on the environment:
| Environment | Command | Purpose |
|-------------|---------|---------|
| **Development** | `drizzle-kit push` | Fast iteration, direct schema sync |
| **Production** | `drizzle-kit generate` + `migrate` | Tracked migrations with history |
### Key Principles
1. **Migrations run BEFORE code deployment** - Ensures database is ready for new code
2. **Advisory locks prevent concurrent migrations** - Safe for multi-replica deployments
3. **Expand-contract pattern for breaking changes** - Zero-downtime schema changes
4. **Data persistence** - Migrations never delete user data unless explicitly requested
### Quick Decision Guide
```
┌─────────────────────────────────────────────────────────────────┐
│ Which command should I use? │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Local development? │
│ └── YES → pnpm db:push (fast, no tracking) │
│ │
│ Staging/Production? │
│ └── YES → pnpm db:generate + pnpm db:migrate (tracked) │
│ │
│ Need to inspect data? │
│ └── YES → pnpm db:studio (opens Drizzle Studio) │
│ │
│ Schema changed by someone else? │
│ └── YES → git pull + pnpm db:push (local) │
│ git pull + pnpm db:migrate (staging/prod) │
│ │
└─────────────────────────────────────────────────────────────────┘
```
---
## Drizzle Migration Internals
Understanding how Drizzle manages migrations is essential for debugging issues.
### The Two Modes
#### 1. Push Mode (`drizzle-kit push`)
**How it works:**
1. Drizzle introspects your TypeScript schema files
2. Drizzle introspects the current database schema
3. Drizzle computes the diff between them
4. Drizzle generates and **immediately executes** the SQL to sync them
**Characteristics:**
- No migration files created
- No history tracking
- Direct database modification
- Interactive confirmation (use `--force` to skip)
**When to use:** Local development, experimentation, prototyping
#### 2. Generate + Migrate Mode (`drizzle-kit generate` + `migrate`)
**How it works:**
**Step 1: Generate** (`drizzle-kit generate`)
1. Drizzle introspects your TypeScript schema files
2. Drizzle reads the last snapshot from `migrations/meta/`
3. Drizzle computes the diff
4. Drizzle creates migration files (SQL + snapshot)
**Step 2: Migrate** (`pnpm db:migrate`)
1. Script reads `migrations/meta/_journal.json`
2. Script queries `__drizzle_migrations` table in database
3. Script determines which migrations haven't been applied
4. Script executes pending migrations in order
5. Script records applied migrations in `__drizzle_migrations`
**Characteristics:**
- Creates versioned SQL files
- Full history tracking
- Repeatable deployments
- Can be reviewed before applying
**When to use:** Staging, production, CI/CD pipelines
### Migration File Structure
```
src/db/migrations/
├── 0000_initial_schema/
│ ├── migration.sql # The actual SQL to execute
│ └── snapshot.json # Schema snapshot AFTER this migration
├── 0001_add_user_preferences/
│ ├── migration.sql
│ └── snapshot.json
├── 0002_add_credits_table/
│ ├── migration.sql
│ └── snapshot.json
└── meta/
└── _journal.json # Migration registry (order + metadata)
```
### The Journal File (`_journal.json`)
This file tracks all generated migrations:
```json
{
"version": "7",
"dialect": "postgresql",
"entries": [
{
"idx": 0,
"version": "7",
"when": 1733066521000,
"tag": "0000_initial_schema",
"breakpoints": true
},
{
"idx": 1,
"version": "7",
"when": 1733152921000,
"tag": "0001_add_user_preferences",
"breakpoints": true
}
]
}
```
**Key fields:**
- `idx`: Sequential index (order matters!)
- `tag`: Folder name containing the migration
- `when`: Unix timestamp when generated
- `breakpoints`: Whether to use statement breakpoints
### The Database Tracking Table (`__drizzle_migrations`)
Drizzle creates this table automatically to track applied migrations:
```sql
-- Schema: drizzle
-- Table: __drizzle_migrations
CREATE TABLE drizzle.__drizzle_migrations (
id SERIAL PRIMARY KEY,
hash TEXT NOT NULL,
created_at BIGINT NOT NULL
);
```
**Query applied migrations:**
```sql
SELECT * FROM drizzle.__drizzle_migrations ORDER BY created_at;
```
### How Migration Tracking Works
```
┌─────────────────┐ ┌─────────────────┐
│ _journal.json │ │ __drizzle_ │
│ (filesystem) │ │ migrations (db) │
└────────┬────────┘ └────────┬────────┘
│ │
▼ ▼
[0000, 0001, 0002] [hash_0000, hash_0001]
│ │
└───────────┬───────────┘
Pending: [0002]
Execute 0002/migration.sql
Insert into __drizzle_migrations
```
### Snapshot Files
Each migration includes a `snapshot.json` that captures the **complete schema state** after that migration. This allows Drizzle to:
1. Compute diffs for the next migration
2. Detect schema drift
3. Generate accurate SQL
**Important:** Never modify snapshots manually!
---
## Migration Commands
### All Backends
```bash
# Development - push schema directly (fast, no history)
pnpm db:push
# Generate migration files from schema changes
pnpm db:generate
# Run migrations with advisory locks (production-safe)
pnpm db:migrate
# Open Drizzle Studio for database inspection
pnpm db:studio
```
### Root-Level Commands
```bash
# Setup all databases (creates DBs + pushes schemas)
pnpm setup:db
# Setup specific service
pnpm setup:db:auth
pnpm setup:db:chat
```
### Per-Service Commands
```bash
# mana-core-auth
pnpm --filter mana-core-auth db:push
pnpm --filter mana-core-auth db:generate
pnpm --filter mana-core-auth db:migrate
# chat-backend
pnpm --filter @chat/backend db:push
pnpm --filter @chat/backend db:migrate
```
---
## Development vs Production
### Development Workflow
For local development, use `db:push` for fast iteration:
```bash
# 1. Make schema changes in src/db/schema/*.ts
# 2. Push changes to local database
pnpm db:push
# Or use the full dev command which handles this automatically
pnpm dev:chat:full
```
**Why `push` for development?**
- Instant feedback on schema changes
- No migration file clutter during experimentation
- Automatically handled by `dev:*:full` commands
### Production Workflow
For staging/production, use migration files for trackability:
```bash
# 1. Make schema changes in src/db/schema/*.ts
# 2. Generate migration file
pnpm db:generate --name add_user_preferences
# 3. Review generated SQL
cat src/db/migrations/*/migration.sql
# 4. Commit migration files
git add src/db/migrations/
git commit -m "feat: add user preferences table"
# 5. CI/CD runs migrations automatically on deploy
```
**Why migrations for production?**
- Audit trail of all schema changes
- Repeatable deployments
- Rollback capability (with manual down migrations)
---
## CI/CD Pipeline
### Deployment Flow
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Build │───>│ Create DB │───>│ Migrate │───>│ Deploy │
│ Images │ │ (if new) │ │ Database │ │ Code │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
```
### Migration Step Features
1. **Retry logic** - 3 attempts with exponential backoff (10s, 20s, 30s)
2. **Timeout protection** - 5-minute timeout per migration
3. **Advisory locks** - Prevents concurrent migrations
4. **Graceful fallback** - Falls back to `db:push` if `db:migrate` unavailable
### Staging Deployment
Migrations run automatically after database creation:
```yaml
# .github/workflows/cd-staging.yml
- name: Run database migrations
run: |
docker compose exec -T mana-core-auth pnpm run db:migrate
```
### Production Deployment
Migrations run BEFORE deploying new code:
```yaml
# .github/workflows/cd-production.yml
- name: Run database migrations
run: |
docker compose run --rm mana-core-auth pnpm run db:migrate
- name: Deploy with zero-downtime
run: |
docker compose up -d
```
---
## Advisory Locks
Advisory locks prevent multiple instances from running migrations simultaneously.
### How It Works
```typescript
// services/mana-core-auth/src/db/migrate.ts
const MIGRATION_LOCK_ID = 987654321;
// Acquire lock before migration
await db.execute(sql`SELECT pg_try_advisory_lock(${LOCK_ID})`);
// Run migrations...
// Release lock after migration
await db.execute(sql`SELECT pg_advisory_unlock(${LOCK_ID})`);
```
### Lock Behavior
| Scenario | Behavior |
|----------|----------|
| Lock acquired | Migration runs immediately |
| Lock held by another process | Waits up to 5 minutes, then fails |
| Lock stuck | Manual release required (see Troubleshooting) |
### Lock IDs by Service
| Service | Lock ID |
|---------|---------|
| mana-core-auth | `987654321` |
| chat-backend | (to be assigned) |
| todo-backend | (to be assigned) |
### Migration Script Architecture
The production migration script (`src/db/migrate.ts`) is designed for safe, concurrent-safe deployments:
```
┌─────────────────────────────────────────────────────────────────┐
│ migrate.ts Execution Flow │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. Load environment variables (.env) │
│ └── DATABASE_URL, MIGRATION_TIMEOUT │
│ │
│ 2. Create single-connection pool │
│ └── max: 1 (dedicated migration connection) │
│ │
│ 3. Test database connectivity (with retry) │
│ └── SELECT 1 (max 3 attempts, exponential backoff) │
│ │
│ 4. Acquire advisory lock │
│ ├── pg_try_advisory_lock() - non-blocking attempt │
│ └── If busy: poll every 5s until timeout (default: 5 min) │
│ │
│ 5. Check for migration files │
│ └── If meta/_journal.json missing: exit gracefully │
│ │
│ 6. Run Drizzle migrations │
│ └── migrate(db, { migrationsFolder }) │
│ │
│ 7. Cleanup (always runs, even on error) │
│ ├── Release advisory lock │
│ └── Close database connection │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Key Components:**
| Component | Purpose | Configuration |
|-----------|---------|---------------|
| `withRetry()` | Retry transient errors (network, connection) | 3 attempts, exponential backoff |
| `acquireLock()` | Non-blocking lock attempt | `pg_try_advisory_lock()` |
| `waitForLock()` | Polling wait for lock | 5s intervals, configurable timeout |
| `releaseLock()` | Release lock in finally block | Always runs |
**Error Handling:**
```typescript
// Transient errors (will retry):
- ECONNREFUSED, ETIMEDOUT, ENOTFOUND
- Connection errors
- PostgreSQL 57P03 (cannot connect now)
// Non-transient errors (immediate failure):
- Missing DATABASE_URL
- SQL syntax errors
- Schema conflicts
- Lock timeout
```
**Exit Codes:**
| Code | Meaning |
|------|---------|
| 0 | Success - all migrations applied |
| 1 | Failure - check logs for details |
---
## Zero-Downtime Migrations
For breaking schema changes, use the **expand-contract pattern**:
### Phase 1: Expand
Add new schema elements alongside existing ones:
```sql
-- Migration: 001_add_full_name.sql
ALTER TABLE users ADD COLUMN full_name TEXT;
```
### Phase 2: Migrate
Update application to write to both, backfill data:
```typescript
// Application code - dual write
await db.update(users).set({
name: newName, // Old column
fullName: newName, // New column
});
// Backfill script
UPDATE users SET full_name = name WHERE full_name IS NULL;
```
### Phase 3: Contract
After 1-2 weeks, remove old column:
```sql
-- Migration: 002_drop_name_column.sql
ALTER TABLE users DROP COLUMN name;
```
### Common Patterns
| Change Type | Approach |
|-------------|----------|
| Add column | Direct `ALTER TABLE ADD COLUMN` |
| Drop column | Remove from code first, wait 2 weeks, then drop |
| Rename column | Add new → dual-write → backfill → drop old |
| Change type | Add new column → backfill with cast → swap |
| Add NOT NULL | Add nullable → backfill → add constraint |
### Index Creation
Always use `CONCURRENTLY` to avoid table locks:
```sql
-- Good
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
-- Bad (locks table)
CREATE INDEX idx_users_email ON users(email);
```
---
## Rollback Procedures
### Automatic Rollback (Not Supported)
Drizzle ORM does not support automatic rollbacks. Plan your migrations carefully.
### Manual Rollback
1. **Write down migration scripts** alongside up migrations:
```
src/db/migrations/
├── 001_add_feature.up.sql
├── 001_add_feature.down.sql # Manual rollback script
```
2. **Execute rollback manually**:
```bash
# Connect to database
docker compose exec -T postgres psql -U postgres -d manacore_auth
# Run down migration
\i /path/to/001_add_feature.down.sql
```
### Rollback Checklist
- [ ] Identify affected migration
- [ ] Verify rollback script exists and is tested
- [ ] Create database backup before rollback
- [ ] Execute rollback in staging first
- [ ] Monitor for issues after rollback
- [ ] Update application code if needed
---
## Troubleshooting
### Migration Lock Stuck
If a migration lock is stuck (process crashed without releasing):
```sql
-- Check for stuck locks
SELECT * FROM pg_locks WHERE locktype = 'advisory';
-- Release specific lock (replace LOCK_ID)
SELECT pg_advisory_unlock(987654321);
-- Release all advisory locks for current session
SELECT pg_advisory_unlock_all();
```
### Migration Timeout
If migrations time out:
1. Check for long-running queries: `SELECT * FROM pg_stat_activity;`
2. Increase timeout: `MIGRATION_TIMEOUT=600 pnpm db:migrate`
3. Break large migrations into smaller steps
### Schema Drift
If staging/production schema differs from expected:
```bash
# Generate migration from current schema
pnpm db:generate --name sync_schema
# Review and apply
pnpm db:migrate
```
### Connection Issues
```bash
# Test database connectivity
docker compose exec -T postgres pg_isready -U postgres
# Check environment variables
echo $DATABASE_URL
# Manual connection test
docker compose exec -T postgres psql -U postgres -d manacore_auth -c "SELECT 1"
```
### Migration Fails in CI/CD
1. Check GitHub Actions logs for specific error
2. Verify DATABASE_URL is correctly set in secrets
3. Ensure database exists before migration runs
4. Check if another migration is running (advisory lock)
---
## Best Practices
### DO
- Run migrations before deploying new code
- Test migrations in staging before production
- Use `CONCURRENTLY` for index creation
- Keep migrations small and focused
- Commit migration files to version control
- Wait 1-2 weeks before dropping columns
### DON'T
- Run `db:push` in production
- Delete migration files after they've been applied
- Modify migration files after they've been applied
- Add NOT NULL without default or backfill
- Create indexes without `CONCURRENTLY`
- Drop columns immediately after removing from code
---
## Migration File Structure
```
services/mana-core-auth/
├── src/db/
│ ├── schema/
│ │ ├── index.ts # Export all schemas
│ │ ├── auth.schema.ts # User, session tables
│ │ └── credits.schema.ts # Credit system tables
│ ├── migrations/
│ │ ├── 0001_initial/
│ │ │ ├── snapshot.json
│ │ │ └── migration.sql
│ │ └── meta/
│ │ └── _journal.json # Migration history
│ ├── connection.ts # Database connection
│ └── migrate.ts # Migration script with locks
└── drizzle.config.ts # Drizzle configuration
```
---
## Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `DATABASE_URL` | PostgreSQL connection string | Required |
| `MIGRATION_TIMEOUT` | Max seconds for migration | `300` |
---
## References
- [Drizzle ORM Migrations](https://orm.drizzle.team/docs/migrations)
- [PostgreSQL Advisory Locks](https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS)
- [Expand-Contract Pattern](https://martinfowler.com/bliki/ParallelChange.html)
- [Zero-Downtime PostgreSQL Migrations](https://postgres.ai/blog/20210923-zero-downtime-postgres-schema-migrations-lock-timeout-and-retries)