Set useJWTPlugin: true so id_tokens are signed with EdDSA keys from JWKS instead of HS256. This fixes Synapse OIDC integration which verifies tokens via JWKS endpoint.
6.7 KiB
Mana Core Auth - Disaster Recovery
Overview
This document describes backup, recovery, and disaster recovery procedures for the Mana Core Auth service.
Data Assets
Critical Data
| Data | Location | Recovery Priority |
|---|---|---|
| User accounts | auth.users table |
Critical |
| Sessions | auth.sessions table |
High (can regenerate) |
| JWKS keys | auth.jwks table |
Critical |
| Organizations | auth.organizations table |
Critical |
| Credit balances | credits.balances table |
Critical |
Non-Critical Data (Can Regenerate)
- Sessions (users can re-login)
- Verification tokens (users can request new ones)
- Rate limit counters (stored in Redis)
Backup Strategy
Database Backups
Automated Daily Backups
#!/bin/bash
# backup-database.sh
BACKUP_DIR="/backups/mana-core-auth"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="${BACKUP_DIR}/manacore_auth_${DATE}.sql.gz"
# Create backup
pg_dump "$DATABASE_URL" | gzip > "$BACKUP_FILE"
# Keep last 30 days
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +30 -delete
# Upload to S3 (optional)
aws s3 cp "$BACKUP_FILE" "s3://your-backup-bucket/mana-core-auth/"
Before Major Changes
Always create a manual backup before:
- Database migrations
- Schema changes
- Bulk data operations
pg_dump "$DATABASE_URL" > pre_migration_backup.sql
Redis Backups (if used)
Redis data is ephemeral (sessions). No backup required, but you can:
# Create RDB snapshot
redis-cli BGSAVE
# Copy dump.rdb to backup location
cp /var/lib/redis/dump.rdb /backups/redis/
JWKS Key Backup
The JWKS keys are critical for JWT validation. Back them up separately:
# Export JWKS keys
psql "$DATABASE_URL" -c "COPY auth.jwks TO '/backups/jwks_backup.csv' CSV HEADER;"
Recovery Procedures
Scenario 1: Database Corruption
-
Stop the service
docker stop mana-core-auth -
Restore from backup
# Drop and recreate database psql -c "DROP DATABASE manacore_auth;" psql -c "CREATE DATABASE manacore_auth;" # Restore backup gunzip -c /backups/manacore_auth_20240201.sql.gz | psql manacore_auth -
Verify data integrity
psql manacore_auth -c "SELECT COUNT(*) FROM auth.users;" psql manacore_auth -c "SELECT COUNT(*) FROM auth.jwks;" -
Restart the service
docker start mana-core-auth -
Verify health
curl http://localhost:3001/health/ready
Scenario 2: JWKS Key Loss
If JWKS keys are lost, all existing JWTs become invalid.
-
Option A: Restore from backup
psql "$DATABASE_URL" -c "COPY auth.jwks FROM '/backups/jwks_backup.csv' CSV HEADER;" -
Option B: Generate new keys (forces all users to re-login)
# Better Auth will auto-generate new keys on startup # All existing sessions will be invalidated docker restart mana-core-auth -
Notify affected services
- All services caching the old JWKS need to refresh
- Users will need to log in again
Scenario 3: Complete Service Failure
-
Provision new infrastructure
- New database instance
- New Redis instance (if used)
- New compute instance
-
Restore database
# Create database psql -c "CREATE DATABASE manacore_auth;" # Restore latest backup gunzip -c /backups/latest.sql.gz | psql manacore_auth -
Update DNS/Load Balancer
- Point to new service instance
-
Verify all integrations
- Check OIDC clients can authenticate
- Check other services can validate tokens
Scenario 4: Accidental Data Deletion
-
Identify affected data
-- Check what's missing SELECT COUNT(*) FROM auth.users WHERE deleted_at IS NOT NULL; -
Restore from point-in-time backup
# If using PostgreSQL with WAL archiving pg_restore --target-time="2024-02-01 10:00:00" backup.dump -
Selective restore
-- Restore specific users from backup database INSERT INTO auth.users SELECT * FROM backup_db.auth.users WHERE id IN ('user1', 'user2');
Key Rotation
Scheduled Key Rotation
JWKS keys should be rotated periodically (recommended: every 90 days).
-
Generate new key
# Better Auth handles this automatically # Or manually via database -
Keep old key for grace period
- Old tokens remain valid until expiry
- New tokens use new key
-
Remove old key after grace period
DELETE FROM auth.jwks WHERE created_at < NOW() - INTERVAL '7 days' AND id != (SELECT id FROM auth.jwks ORDER BY created_at DESC LIMIT 1);
Emergency Key Rotation
If keys are compromised:
-
Immediately revoke old keys
DELETE FROM auth.jwks; -
Restart service to generate new keys
docker restart mana-core-auth -
Notify all integrated services
- They need to refresh their JWKS cache
- All users will need to re-authenticate
Monitoring & Alerts
Critical Alerts
Set up alerts for:
-
Backup failures
- Backup script exit code != 0
- Backup file size = 0
-
Database health
- Connection failures
- Replication lag (if applicable)
-
Service health
- /health/ready returning non-200
- High error rate
Recovery Time Objectives
| Scenario | RTO | RPO |
|---|---|---|
| Service restart | 5 min | 0 |
| Database restore | 30 min | 24h (daily backup) |
| Complete rebuild | 2 hours | 24h |
Runbook
Daily Operations
- Verify backup completed
- Check monitoring dashboards
- Review error logs
Weekly Operations
- Test backup restoration (staging)
- Review security logs
- Check disk space
Monthly Operations
- Full disaster recovery drill
- Review and update this document
- Verify all contact information is current
Contact Information
| Role | Contact |
|---|---|
| On-call Engineer | oncall@yourcompany.com |
| Database Admin | dba@yourcompany.com |
| Security Team | security@yourcompany.com |
Appendix: SQL Scripts
Verify Data Integrity
-- Check user count
SELECT COUNT(*) as total_users FROM auth.users;
-- Check for orphaned data
SELECT COUNT(*) as orphaned_sessions
FROM auth.sessions s
LEFT JOIN auth.users u ON s.user_id = u.id
WHERE u.id IS NULL;
-- Check JWKS keys
SELECT id, created_at FROM auth.jwks ORDER BY created_at DESC;
-- Check credit balances
SELECT COUNT(*) as users_with_balance
FROM credits.balances;
Emergency Cleanup
-- Clear expired sessions
DELETE FROM auth.sessions WHERE expires_at < NOW();
-- Clear expired verification tokens
DELETE FROM auth.verification_tokens WHERE expires_at < NOW();