mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 23:41:08 +02:00
2816 lines
81 KiB
Markdown
2816 lines
81 KiB
Markdown
# Manacore Monorepo - Deployment Architecture
|
|
|
|
**Version:** 1.0
|
|
**Date:** 2025-11-27
|
|
**Author:** Hive Mind Swarm Analyst
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Executive Summary](#executive-summary)
|
|
2. [System Inventory](#system-inventory)
|
|
3. [Container Architecture](#container-architecture)
|
|
4. [Service Orchestration](#service-orchestration)
|
|
5. [Deployment Topology](#deployment-topology)
|
|
6. [Data Architecture](#data-architecture)
|
|
7. [Network Architecture](#network-architecture)
|
|
8. [Environment Configuration Matrix](#environment-configuration-matrix)
|
|
9. [Monitoring & Observability](#monitoring--observability)
|
|
10. [CI/CD Pipeline](#cicd-pipeline)
|
|
11. [Disaster Recovery](#disaster-recovery)
|
|
12. [Security Hardening](#security-hardening)
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
The manacore-monorepo contains **10 product projects** with **37 deployable services** across multiple technology stacks:
|
|
|
|
- **10 NestJS backend APIs** (Node.js microservices)
|
|
- **9 SvelteKit web applications** (SSR/SSG)
|
|
- **9 Astro landing pages** (static sites)
|
|
- **8 Expo mobile apps** (served via CDN for OTA updates)
|
|
- **1 Central authentication service** (mana-core-auth)
|
|
|
|
**Key Architectural Decisions:**
|
|
|
|
- **Per-project container isolation** for independent scaling
|
|
- **Shared infrastructure** for databases (PostgreSQL) and caching (Redis)
|
|
- **Multi-stage Docker builds** optimized for pnpm workspace monorepo
|
|
- **Blue-green deployment** strategy with zero-downtime rollbacks
|
|
- **Docker Compose orchestration** with GitHub Container Registry
|
|
- **CDN-first static assets** (Astro landing pages, mobile OTA bundles)
|
|
|
|
---
|
|
|
|
## System Inventory
|
|
|
|
### Complete Service Matrix
|
|
|
|
| Project | Backend (NestJS) | Web (SvelteKit) | Landing (Astro) | Mobile (Expo) | Port Range |
|
|
|---------|------------------|-----------------|-----------------|---------------|------------|
|
|
| **mana-core-auth** | ✅ 3001 | ❌ | ❌ | ❌ | 3001 |
|
|
| **chat** | ✅ 3002 | ✅ | ✅ | ✅ | 3002-3005 |
|
|
| **maerchenzauber** | ✅ 3003 | ✅ | ✅ | ✅ | 3010-3013 |
|
|
| **manadeck** | ✅ 3004 | ✅ | ✅ | ✅ | 3020-3023 |
|
|
| **memoro** | ❌ | ✅ | ✅ | ✅ | 3030-3032 |
|
|
| **manacore** | ❌ | ✅ | ✅ | ✅ | 3040-3042 |
|
|
| **picture** | ✅ 3005 | ✅ | ✅ | ✅ | 3050-3053 |
|
|
| **uload** | ✅ 3006 | ✅ | ✅ | ❌ | 3060-3062 |
|
|
| **nutriphi** | ✅ 3007 | ✅ | ✅ | ✅ | 3070-3073 |
|
|
| **news** | ✅ 3008 (api) | ✅ | ✅ | ❌ | 3080-3082 |
|
|
|
|
**Total Deployable Services:** 37 containers + 2 shared infrastructure (PostgreSQL, Redis)
|
|
|
|
### Technology Stack Breakdown
|
|
|
|
#### Backend (NestJS) - 10 services
|
|
- **Node.js:** 20 LTS
|
|
- **Framework:** NestJS 10-11
|
|
- **Database:** Drizzle ORM + PostgreSQL
|
|
- **Runtime:** Node.js process (no PM2 needed in containers)
|
|
|
|
#### Web (SvelteKit) - 9 services
|
|
- **Node.js:** 20 LTS
|
|
- **Framework:** SvelteKit 2.x + Svelte 5 (runes mode)
|
|
- **Adapter:** `@sveltejs/adapter-node` for Docker or `@sveltejs/adapter-netlify` for Netlify
|
|
- **Build output:** SSR Node server
|
|
|
|
#### Landing (Astro) - 9 services
|
|
- **Framework:** Astro 5.x
|
|
- **Build output:** Static files (HTML/CSS/JS)
|
|
- **Deployment:** CDN (Cloudflare, Netlify, Vercel) or Nginx container
|
|
|
|
#### Mobile (Expo) - 8 services
|
|
- **Framework:** React Native + Expo SDK 52-54
|
|
- **Deployment:**
|
|
- **OTA Updates:** EAS Update (served from CDN)
|
|
- **Binaries:** App Store / Google Play Store
|
|
- **Dev:** Expo Go or custom dev client
|
|
|
|
### Shared Packages (19 packages)
|
|
|
|
All shared packages must be built before deployment:
|
|
|
|
```
|
|
packages/shared-auth
|
|
packages/shared-auth-ui
|
|
packages/shared-branding
|
|
packages/shared-errors
|
|
packages/shared-i18n
|
|
packages/shared-supabase
|
|
packages/shared-types
|
|
packages/shared-utils
|
|
... (19 total)
|
|
```
|
|
|
|
---
|
|
|
|
## Container Architecture
|
|
|
|
### 1. Dockerfile Strategy
|
|
|
|
#### 1.1 NestJS Backend Template
|
|
|
|
**File:** `docker/templates/Dockerfile.nestjs`
|
|
|
|
```dockerfile
|
|
# =============================================================================
|
|
# Multi-stage Dockerfile for NestJS Backend (Monorepo-optimized)
|
|
# Build from monorepo root with context=.
|
|
# =============================================================================
|
|
|
|
# -----------------------------------------------------------------------------
|
|
# Stage 1: Base - Install pnpm and prepare workspace
|
|
# -----------------------------------------------------------------------------
|
|
FROM node:20-alpine AS base
|
|
|
|
# Enable corepack for pnpm
|
|
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate
|
|
|
|
WORKDIR /app
|
|
|
|
# Copy workspace configuration
|
|
COPY pnpm-workspace.yaml package.json pnpm-lock.yaml ./
|
|
|
|
# -----------------------------------------------------------------------------
|
|
# Stage 2: Dependencies - Install all dependencies
|
|
# -----------------------------------------------------------------------------
|
|
FROM base AS dependencies
|
|
|
|
# Copy all package.json files (for dependency resolution)
|
|
COPY packages/*/package.json ./packages/
|
|
COPY apps/*/apps/*/package.json ./apps/
|
|
COPY services/*/package.json ./services/
|
|
|
|
# Install all dependencies (frozen lockfile for reproducibility)
|
|
RUN pnpm install --frozen-lockfile --filter=@PROJECT/backend...
|
|
|
|
# -----------------------------------------------------------------------------
|
|
# Stage 3: Builder - Build shared packages and backend
|
|
# -----------------------------------------------------------------------------
|
|
FROM dependencies AS builder
|
|
|
|
# Copy source code for shared packages
|
|
COPY packages/ ./packages/
|
|
|
|
# Build shared packages (Turborepo cache)
|
|
RUN pnpm --filter '@manacore/shared-*' build
|
|
|
|
# Copy backend source
|
|
ARG PROJECT_PATH
|
|
COPY ${PROJECT_PATH} ./${PROJECT_PATH}
|
|
|
|
# Build backend
|
|
WORKDIR /app/${PROJECT_PATH}
|
|
RUN pnpm build
|
|
|
|
# -----------------------------------------------------------------------------
|
|
# Stage 4: Production - Minimal runtime image
|
|
# -----------------------------------------------------------------------------
|
|
FROM node:20-alpine AS production
|
|
|
|
# Security: Non-root user
|
|
RUN addgroup -g 1001 nodejs && adduser -u 1001 -G nodejs -s /bin/sh -D nodejs
|
|
|
|
# Install runtime dependencies only (for health checks, migrations)
|
|
RUN apk add --no-cache postgresql-client wget
|
|
|
|
WORKDIR /app
|
|
|
|
# Copy built artifacts
|
|
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
|
|
COPY --from=builder --chown=nodejs:nodejs /app/packages ./packages
|
|
COPY --from=builder --chown=nodejs:nodejs /app/${PROJECT_PATH}/dist ./dist
|
|
COPY --from=builder --chown=nodejs:nodejs /app/${PROJECT_PATH}/package.json ./
|
|
|
|
# Environment
|
|
ENV NODE_ENV=production
|
|
ENV PORT=3000
|
|
|
|
# Health check
|
|
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
|
|
CMD wget --no-verbose --tries=1 --spider http://localhost:${PORT}/api/health || exit 1
|
|
|
|
# Switch to non-root user
|
|
USER nodejs
|
|
|
|
EXPOSE ${PORT}
|
|
|
|
# Start server
|
|
CMD ["node", "dist/main.js"]
|
|
```
|
|
|
|
**Build Arguments:**
|
|
- `PROJECT_PATH`: e.g., `apps/chat/apps/backend`
|
|
- `PORT`: Service port (default: 3000)
|
|
|
|
**Example Build:**
|
|
```bash
|
|
docker build \
|
|
--build-arg PROJECT_PATH=apps/chat/apps/backend \
|
|
--build-arg PORT=3002 \
|
|
-t chat-backend:latest \
|
|
-f docker/templates/Dockerfile.nestjs \
|
|
.
|
|
```
|
|
|
|
---
|
|
|
|
#### 1.2 SvelteKit Web Template
|
|
|
|
**File:** `docker/templates/Dockerfile.sveltekit`
|
|
|
|
```dockerfile
|
|
# =============================================================================
|
|
# Multi-stage Dockerfile for SvelteKit Web App (Monorepo-optimized)
|
|
# Build from monorepo root with context=.
|
|
# =============================================================================
|
|
|
|
# -----------------------------------------------------------------------------
|
|
# Stage 1: Base - Install pnpm and prepare workspace
|
|
# -----------------------------------------------------------------------------
|
|
FROM node:20-alpine AS base
|
|
|
|
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate
|
|
|
|
WORKDIR /app
|
|
|
|
COPY pnpm-workspace.yaml package.json pnpm-lock.yaml ./
|
|
|
|
# -----------------------------------------------------------------------------
|
|
# Stage 2: Dependencies
|
|
# -----------------------------------------------------------------------------
|
|
FROM base AS dependencies
|
|
|
|
COPY packages/*/package.json ./packages/
|
|
COPY apps/*/apps/*/package.json ./apps/
|
|
|
|
ARG PROJECT_PATH
|
|
RUN pnpm install --frozen-lockfile --filter=${PROJECT_PATH}...
|
|
|
|
# -----------------------------------------------------------------------------
|
|
# Stage 3: Builder
|
|
# -----------------------------------------------------------------------------
|
|
FROM dependencies AS builder
|
|
|
|
# Copy shared packages source
|
|
COPY packages/ ./packages/
|
|
|
|
# Build shared packages
|
|
RUN pnpm --filter '@manacore/shared-*' build
|
|
|
|
# Copy web app source
|
|
ARG PROJECT_PATH
|
|
COPY ${PROJECT_PATH} ./${PROJECT_PATH}
|
|
|
|
WORKDIR /app/${PROJECT_PATH}
|
|
|
|
# Build SvelteKit app (adapter-node output)
|
|
RUN pnpm build
|
|
|
|
# -----------------------------------------------------------------------------
|
|
# Stage 4: Production
|
|
# -----------------------------------------------------------------------------
|
|
FROM node:20-alpine AS production
|
|
|
|
RUN addgroup -g 1001 nodejs && adduser -u 1001 -G nodejs -s /bin/sh -D nodejs
|
|
|
|
WORKDIR /app
|
|
|
|
ARG PROJECT_PATH
|
|
COPY --from=builder --chown=nodejs:nodejs /app/${PROJECT_PATH}/build ./build
|
|
COPY --from=builder --chown=nodejs:nodejs /app/${PROJECT_PATH}/package.json ./
|
|
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
|
|
|
|
ENV NODE_ENV=production
|
|
ENV PORT=3000
|
|
ENV HOST=0.0.0.0
|
|
|
|
HEALTHCHECK --interval=30s --timeout=5s --start-period=5s --retries=3 \
|
|
CMD wget --no-verbose --tries=1 --spider http://localhost:${PORT}/api/health || exit 1
|
|
|
|
USER nodejs
|
|
|
|
EXPOSE ${PORT}
|
|
|
|
CMD ["node", "build"]
|
|
```
|
|
|
|
**Notes:**
|
|
- Requires `@sveltejs/adapter-node` in `svelte.config.js`
|
|
- Replace Netlify adapter with Node adapter for Docker deployment
|
|
|
|
---
|
|
|
|
#### 1.3 Astro Landing Page Template
|
|
|
|
**File:** `docker/templates/Dockerfile.astro`
|
|
|
|
```dockerfile
|
|
# =============================================================================
|
|
# Multi-stage Dockerfile for Astro Landing Page (Static Site)
|
|
# Serves via Nginx for production
|
|
# =============================================================================
|
|
|
|
# -----------------------------------------------------------------------------
|
|
# Stage 1: Builder
|
|
# -----------------------------------------------------------------------------
|
|
FROM node:20-alpine AS builder
|
|
|
|
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate
|
|
|
|
WORKDIR /app
|
|
|
|
COPY pnpm-workspace.yaml package.json pnpm-lock.yaml ./
|
|
COPY packages/*/package.json ./packages/
|
|
COPY apps/*/apps/*/package.json ./apps/
|
|
|
|
ARG PROJECT_PATH
|
|
RUN pnpm install --frozen-lockfile --filter=${PROJECT_PATH}...
|
|
|
|
COPY packages/ ./packages/
|
|
RUN pnpm --filter '@manacore/shared-landing-ui' build
|
|
|
|
COPY ${PROJECT_PATH} ./${PROJECT_PATH}
|
|
|
|
WORKDIR /app/${PROJECT_PATH}
|
|
RUN pnpm build
|
|
|
|
# -----------------------------------------------------------------------------
|
|
# Stage 2: Nginx Server
|
|
# -----------------------------------------------------------------------------
|
|
FROM nginx:1.25-alpine AS production
|
|
|
|
# Copy built static files
|
|
ARG PROJECT_PATH
|
|
COPY --from=builder /app/${PROJECT_PATH}/dist /usr/share/nginx/html
|
|
|
|
# Copy custom Nginx config (optional)
|
|
COPY docker/templates/nginx.conf /etc/nginx/nginx.conf
|
|
|
|
# Health check
|
|
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
|
|
CMD wget --no-verbose --tries=1 --spider http://localhost:80/health || exit 1
|
|
|
|
EXPOSE 80
|
|
|
|
CMD ["nginx", "-g", "daemon off;"]
|
|
```
|
|
|
|
**Nginx Configuration:**
|
|
|
|
```nginx
|
|
# docker/templates/nginx.conf
|
|
worker_processes auto;
|
|
events { worker_connections 1024; }
|
|
|
|
http {
|
|
include /etc/nginx/mime.types;
|
|
default_type application/octet-stream;
|
|
|
|
sendfile on;
|
|
tcp_nopush on;
|
|
tcp_nodelay on;
|
|
keepalive_timeout 65;
|
|
gzip on;
|
|
gzip_types text/plain text/css application/json application/javascript text/xml application/xml;
|
|
|
|
server {
|
|
listen 80;
|
|
server_name _;
|
|
root /usr/share/nginx/html;
|
|
index index.html;
|
|
|
|
# Cache static assets
|
|
location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2|ttf|eot)$ {
|
|
expires 1y;
|
|
add_header Cache-Control "public, immutable";
|
|
}
|
|
|
|
# SPA fallback
|
|
location / {
|
|
try_files $uri $uri/ /index.html;
|
|
}
|
|
|
|
# Health check endpoint
|
|
location /health {
|
|
return 200 "OK";
|
|
add_header Content-Type text/plain;
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 2. Base Image Selection
|
|
|
|
| App Type | Base Image | Size | Rationale |
|
|
|----------|------------|------|-----------|
|
|
| **NestJS** | `node:20-alpine` | ~120MB | Minimal footprint, security updates |
|
|
| **SvelteKit** | `node:20-alpine` | ~120MB | Same as NestJS |
|
|
| **Astro** | `nginx:1.25-alpine` | ~40MB | Static files, ultra-fast |
|
|
| **PostgreSQL** | `postgres:16-alpine` | ~230MB | Official, stable |
|
|
| **Redis** | `redis:7-alpine` | ~40MB | Official, minimal |
|
|
|
|
**Why Alpine Linux:**
|
|
- 5x smaller than Debian-based images
|
|
- Fewer attack vectors (minimal packages)
|
|
- Faster pull times
|
|
- Security-hardened by default
|
|
|
|
---
|
|
|
|
### 3. Layer Caching Strategy
|
|
|
|
**Key Optimization:** Leverage Docker layer cache + pnpm's efficient workspace handling.
|
|
|
|
**Cache Layers (in order):**
|
|
|
|
1. **OS & System Packages** (changes rarely)
|
|
```dockerfile
|
|
FROM node:20-alpine
|
|
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate
|
|
```
|
|
|
|
2. **Workspace Configuration** (changes when adding/removing packages)
|
|
```dockerfile
|
|
COPY pnpm-workspace.yaml package.json pnpm-lock.yaml ./
|
|
```
|
|
|
|
3. **Package Manifests** (changes when dependencies update)
|
|
```dockerfile
|
|
COPY packages/*/package.json ./packages/
|
|
COPY apps/*/apps/*/package.json ./apps/
|
|
```
|
|
|
|
4. **Dependency Installation** (cache hit ~80% of builds)
|
|
```dockerfile
|
|
RUN pnpm install --frozen-lockfile
|
|
```
|
|
|
|
5. **Source Code** (changes every build)
|
|
```dockerfile
|
|
COPY packages/ ./packages/
|
|
COPY apps/chat/apps/backend ./apps/chat/apps/backend
|
|
```
|
|
|
|
**Build Time Optimization:**
|
|
- **Without cache:** ~10-15 minutes (full dependency install)
|
|
- **With cache:** ~2-3 minutes (only rebuild changed layers)
|
|
|
|
---
|
|
|
|
### 4. Security Hardening
|
|
|
|
#### Non-Root User Execution
|
|
|
|
All containers run as unprivileged user (UID 1001):
|
|
|
|
```dockerfile
|
|
RUN addgroup -g 1001 nodejs && adduser -u 1001 -G nodejs -s /bin/sh -D nodejs
|
|
USER nodejs
|
|
```
|
|
|
|
#### Read-Only Root Filesystem
|
|
|
|
```yaml
|
|
# docker-compose.yml
|
|
security_opt:
|
|
- no-new-privileges:true
|
|
read_only: true
|
|
tmpfs:
|
|
- /tmp
|
|
- /app/.cache
|
|
```
|
|
|
|
#### Minimal Runtime Dependencies
|
|
|
|
```dockerfile
|
|
# Only install essential tools
|
|
RUN apk add --no-cache postgresql-client wget
|
|
```
|
|
|
|
#### Vulnerability Scanning
|
|
|
|
```bash
|
|
# Scan images with Trivy
|
|
trivy image chat-backend:latest --severity HIGH,CRITICAL
|
|
```
|
|
|
|
---
|
|
|
|
## Service Orchestration
|
|
|
|
### 1. Docker Compose for Local Development
|
|
|
|
**File:** `docker-compose.dev.yml` (already exists, enhance it)
|
|
|
|
```yaml
|
|
# Enhanced Development Docker Compose
|
|
version: '3.9'
|
|
|
|
services:
|
|
# ============================================================================
|
|
# Shared Infrastructure
|
|
# ============================================================================
|
|
|
|
postgres:
|
|
image: postgres:16-alpine
|
|
container_name: manacore-postgres
|
|
restart: unless-stopped
|
|
environment:
|
|
POSTGRES_DB: manacore
|
|
POSTGRES_USER: ${POSTGRES_USER:-manacore}
|
|
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-devpassword}
|
|
volumes:
|
|
- postgres-data:/var/lib/postgresql/data
|
|
- ./docker/init-db:/docker-entrypoint-initdb.d:ro
|
|
ports:
|
|
- "5432:5432"
|
|
networks:
|
|
- manacore-network
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "pg_isready -U manacore"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 5
|
|
|
|
redis:
|
|
image: redis:7-alpine
|
|
container_name: manacore-redis
|
|
restart: unless-stopped
|
|
command: redis-server --requirepass ${REDIS_PASSWORD:-devpassword} --maxmemory 256mb --maxmemory-policy allkeys-lru
|
|
volumes:
|
|
- redis-data:/data
|
|
ports:
|
|
- "6379:6379"
|
|
networks:
|
|
- manacore-network
|
|
healthcheck:
|
|
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD:-devpassword}", "ping"]
|
|
interval: 10s
|
|
timeout: 5s
|
|
retries: 3
|
|
|
|
# ============================================================================
|
|
# Mana Core Auth Service
|
|
# ============================================================================
|
|
|
|
mana-core-auth:
|
|
profiles: ["auth", "all"]
|
|
build:
|
|
context: .
|
|
dockerfile: ./services/mana-core-auth/Dockerfile
|
|
container_name: manacore-auth
|
|
restart: unless-stopped
|
|
environment:
|
|
NODE_ENV: development
|
|
PORT: 3001
|
|
DATABASE_URL: postgresql://manacore:devpassword@postgres:5432/manacore
|
|
REDIS_HOST: redis
|
|
REDIS_PORT: 6379
|
|
REDIS_PASSWORD: ${REDIS_PASSWORD:-devpassword}
|
|
JWT_PUBLIC_KEY: ${JWT_PUBLIC_KEY}
|
|
JWT_PRIVATE_KEY: ${JWT_PRIVATE_KEY}
|
|
depends_on:
|
|
postgres:
|
|
condition: service_healthy
|
|
redis:
|
|
condition: service_healthy
|
|
ports:
|
|
- "3001:3001"
|
|
networks:
|
|
- manacore-network
|
|
labels:
|
|
- "com.manacore.service=auth"
|
|
- "com.manacore.tier=infrastructure"
|
|
|
|
# ============================================================================
|
|
# Project Backends (NestJS)
|
|
# ============================================================================
|
|
|
|
chat-backend:
|
|
profiles: ["chat", "all"]
|
|
build:
|
|
context: .
|
|
dockerfile: ./apps/chat/apps/backend/Dockerfile
|
|
container_name: chat-backend
|
|
restart: unless-stopped
|
|
environment:
|
|
NODE_ENV: development
|
|
PORT: 3002
|
|
DATABASE_URL: postgresql://manacore:devpassword@postgres:5432/chat
|
|
AZURE_OPENAI_ENDPOINT: ${AZURE_OPENAI_ENDPOINT}
|
|
AZURE_OPENAI_API_KEY: ${AZURE_OPENAI_API_KEY}
|
|
MANA_CORE_AUTH_URL: http://mana-core-auth:3001
|
|
depends_on:
|
|
postgres:
|
|
condition: service_healthy
|
|
mana-core-auth:
|
|
condition: service_started
|
|
ports:
|
|
- "3002:3002"
|
|
networks:
|
|
- manacore-network
|
|
labels:
|
|
- "com.manacore.project=chat"
|
|
- "com.manacore.service=backend"
|
|
|
|
maerchenzauber-backend:
|
|
profiles: ["maerchenzauber", "all"]
|
|
build:
|
|
context: .
|
|
dockerfile: ./apps/maerchenzauber/apps/backend/Dockerfile
|
|
container_name: maerchenzauber-backend
|
|
restart: unless-stopped
|
|
environment:
|
|
NODE_ENV: development
|
|
PORT: 3003
|
|
DATABASE_URL: postgresql://manacore:devpassword@postgres:5432/maerchenzauber
|
|
SUPABASE_URL: ${MAERCHENZAUBER_SUPABASE_URL}
|
|
SUPABASE_ANON_KEY: ${MAERCHENZAUBER_SUPABASE_ANON_KEY}
|
|
depends_on:
|
|
postgres:
|
|
condition: service_healthy
|
|
ports:
|
|
- "3003:3003"
|
|
networks:
|
|
- manacore-network
|
|
labels:
|
|
- "com.manacore.project=maerchenzauber"
|
|
- "com.manacore.service=backend"
|
|
|
|
# ============================================================================
|
|
# Web Apps (SvelteKit) - Behind Traefik Reverse Proxy
|
|
# ============================================================================
|
|
|
|
chat-web:
|
|
profiles: ["chat", "all"]
|
|
build:
|
|
context: .
|
|
dockerfile: docker/templates/Dockerfile.sveltekit
|
|
args:
|
|
PROJECT_PATH: apps/chat/apps/web
|
|
container_name: chat-web
|
|
restart: unless-stopped
|
|
environment:
|
|
NODE_ENV: production
|
|
PORT: 3000
|
|
PUBLIC_BACKEND_URL: http://chat-backend:3002
|
|
ports:
|
|
- "3100:3000"
|
|
networks:
|
|
- manacore-network
|
|
labels:
|
|
- "com.manacore.project=chat"
|
|
- "com.manacore.service=web"
|
|
- "traefik.enable=true"
|
|
- "traefik.http.routers.chat-web.rule=Host(`chat.localhost`)"
|
|
|
|
# ============================================================================
|
|
# Landing Pages (Astro) - Nginx Static
|
|
# ============================================================================
|
|
|
|
chat-landing:
|
|
profiles: ["chat", "all"]
|
|
build:
|
|
context: .
|
|
dockerfile: docker/templates/Dockerfile.astro
|
|
args:
|
|
PROJECT_PATH: apps/chat/apps/landing
|
|
container_name: chat-landing
|
|
restart: unless-stopped
|
|
ports:
|
|
- "3200:80"
|
|
networks:
|
|
- manacore-network
|
|
labels:
|
|
- "com.manacore.project=chat"
|
|
- "com.manacore.service=landing"
|
|
|
|
# ============================================================================
|
|
# Reverse Proxy (Optional for local dev)
|
|
# ============================================================================
|
|
|
|
traefik:
|
|
profiles: ["proxy", "all"]
|
|
image: traefik:v2.11
|
|
container_name: manacore-traefik
|
|
command:
|
|
- "--api.insecure=true"
|
|
- "--providers.docker=true"
|
|
- "--providers.docker.exposedbydefault=false"
|
|
- "--entrypoints.web.address=:80"
|
|
ports:
|
|
- "80:80"
|
|
- "8080:8080" # Traefik dashboard
|
|
volumes:
|
|
- /var/run/docker.sock:/var/run/docker.sock:ro
|
|
networks:
|
|
- manacore-network
|
|
|
|
networks:
|
|
manacore-network:
|
|
driver: bridge
|
|
|
|
volumes:
|
|
postgres-data:
|
|
redis-data:
|
|
```
|
|
|
|
**Usage:**
|
|
|
|
```bash
|
|
# Start only infrastructure (PostgreSQL + Redis)
|
|
pnpm docker:up
|
|
|
|
# Start auth service
|
|
pnpm docker:up:auth
|
|
|
|
# Start specific project (chat)
|
|
docker compose --profile chat up -d
|
|
|
|
# Start everything
|
|
pnpm docker:up:all
|
|
|
|
# View logs
|
|
pnpm docker:logs:chat
|
|
|
|
# Stop all
|
|
pnpm docker:down
|
|
```
|
|
|
|
---
|
|
|
|
### 2. Production Orchestration (Coolify)
|
|
|
|
**Coolify Configuration:** `.coolify/docker-compose.prod.yml`
|
|
|
|
```yaml
|
|
version: '3.9'
|
|
|
|
# Production Docker Compose for Coolify Deployment
|
|
# Coolify will handle:
|
|
# - Automatic SSL (Let's Encrypt)
|
|
# - Health check monitoring
|
|
# - Auto-restart on failure
|
|
# - Log aggregation
|
|
# - Resource limits
|
|
|
|
services:
|
|
chat-backend:
|
|
image: ${DOCKER_REGISTRY}/chat-backend:${VERSION}
|
|
restart: always
|
|
environment:
|
|
NODE_ENV: production
|
|
PORT: 3002
|
|
DATABASE_URL: ${CHAT_DATABASE_URL}
|
|
AZURE_OPENAI_ENDPOINT: ${AZURE_OPENAI_ENDPOINT}
|
|
AZURE_OPENAI_API_KEY: ${AZURE_OPENAI_API_KEY}
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
cpus: '1.0'
|
|
memory: 512M
|
|
reservations:
|
|
cpus: '0.5'
|
|
memory: 256M
|
|
healthcheck:
|
|
test: ["CMD", "wget", "--spider", "-q", "http://localhost:3002/api/health"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
start_period: 40s
|
|
labels:
|
|
- "coolify.managed=true"
|
|
- "coolify.project=chat"
|
|
- "coolify.service=backend"
|
|
- "coolify.port=3002"
|
|
- "coolify.domain=api-chat.manacore.app"
|
|
```
|
|
|
|
**Coolify Deployment Strategy:**
|
|
|
|
1. **Per-project services**: Each project (chat, maerchenzauber, etc.) deployed as separate Coolify application
|
|
2. **Resource pools**: Shared PostgreSQL and Redis as Coolify resources
|
|
3. **Auto-scaling**: Configure horizontal scaling based on CPU/memory
|
|
4. **Blue-green deployments**: Coolify's native zero-downtime deployment
|
|
|
|
---
|
|
|
|
### 3. Kubernetes (Future-Proof Option)
|
|
|
|
**File:** `k8s/base/deployment.yaml` (template)
|
|
|
|
```yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: chat-backend
|
|
namespace: manacore
|
|
labels:
|
|
app: chat
|
|
component: backend
|
|
tier: api
|
|
spec:
|
|
replicas: 2
|
|
strategy:
|
|
type: RollingUpdate
|
|
rollingUpdate:
|
|
maxSurge: 1
|
|
maxUnavailable: 0
|
|
selector:
|
|
matchLabels:
|
|
app: chat
|
|
component: backend
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: chat
|
|
component: backend
|
|
spec:
|
|
securityContext:
|
|
runAsNonRoot: true
|
|
runAsUser: 1001
|
|
fsGroup: 1001
|
|
containers:
|
|
- name: chat-backend
|
|
image: registry.manacore.app/chat-backend:latest
|
|
imagePullPolicy: Always
|
|
ports:
|
|
- containerPort: 3002
|
|
name: http
|
|
protocol: TCP
|
|
env:
|
|
- name: NODE_ENV
|
|
value: "production"
|
|
- name: PORT
|
|
value: "3002"
|
|
- name: DATABASE_URL
|
|
valueFrom:
|
|
secretKeyRef:
|
|
name: chat-db-credentials
|
|
key: connection-string
|
|
resources:
|
|
requests:
|
|
cpu: 250m
|
|
memory: 256Mi
|
|
limits:
|
|
cpu: 1000m
|
|
memory: 512Mi
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /api/health
|
|
port: 3002
|
|
initialDelaySeconds: 30
|
|
periodSeconds: 10
|
|
timeoutSeconds: 5
|
|
failureThreshold: 3
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /api/health
|
|
port: 3002
|
|
initialDelaySeconds: 10
|
|
periodSeconds: 5
|
|
timeoutSeconds: 3
|
|
failureThreshold: 3
|
|
securityContext:
|
|
allowPrivilegeEscalation: false
|
|
readOnlyRootFilesystem: true
|
|
capabilities:
|
|
drop:
|
|
- ALL
|
|
---
|
|
apiVersion: v1
|
|
kind: Service
|
|
metadata:
|
|
name: chat-backend
|
|
namespace: manacore
|
|
spec:
|
|
type: ClusterIP
|
|
ports:
|
|
- port: 3002
|
|
targetPort: 3002
|
|
protocol: TCP
|
|
name: http
|
|
selector:
|
|
app: chat
|
|
component: backend
|
|
---
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: Ingress
|
|
metadata:
|
|
name: chat-backend
|
|
namespace: manacore
|
|
annotations:
|
|
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
|
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
|
spec:
|
|
ingressClassName: nginx
|
|
tls:
|
|
- hosts:
|
|
- api-chat.manacore.app
|
|
secretName: chat-backend-tls
|
|
rules:
|
|
- host: api-chat.manacore.app
|
|
http:
|
|
paths:
|
|
- path: /
|
|
pathType: Prefix
|
|
backend:
|
|
service:
|
|
name: chat-backend
|
|
port:
|
|
number: 3002
|
|
```
|
|
|
|
**Helm Chart Structure:**
|
|
|
|
```
|
|
k8s/
|
|
├── base/
|
|
│ ├── deployment.yaml
|
|
│ ├── service.yaml
|
|
│ ├── ingress.yaml
|
|
│ └── configmap.yaml
|
|
├── overlays/
|
|
│ ├── staging/
|
|
│ │ └── kustomization.yaml
|
|
│ └── production/
|
|
│ └── kustomization.yaml
|
|
└── helm/
|
|
└── manacore/
|
|
├── Chart.yaml
|
|
├── values.yaml
|
|
├── values-staging.yaml
|
|
├── values-production.yaml
|
|
└── templates/
|
|
├── deployment.yaml
|
|
├── service.yaml
|
|
├── ingress.yaml
|
|
└── hpa.yaml
|
|
```
|
|
|
|
---
|
|
|
|
## Deployment Topology
|
|
|
|
### 1. Environment Stages
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ DEPLOYMENT PIPELINE │
|
|
├─────────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ [Development] → [Staging] → [Production] │
|
|
│ ↓ ↓ ↓ │
|
|
│ Local Docker Coolify Coolify/K8s │
|
|
│ 127.0.0.1 staging.* app domains │
|
|
│ Hot reload Manual test Blue-green │
|
|
│ No SSL Let's Encrypt Let's Encrypt │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
#### Development Environment
|
|
|
|
- **Location:** Developer workstations
|
|
- **Orchestration:** Docker Compose
|
|
- **Database:** Local PostgreSQL (Docker)
|
|
- **Domains:** `localhost`, `*.localhost`
|
|
- **SSL:** None
|
|
- **Purpose:** Feature development, debugging
|
|
|
|
#### Staging Environment
|
|
|
|
- **Location:** Hetzner VPS (CCX32)
|
|
- **Orchestration:** Docker Compose
|
|
- **Database:** Dedicated Supabase project (staging)
|
|
- **Domains:** `staging-chat.manacore.app`, `staging-api-chat.manacore.app`
|
|
- **SSL:** Let's Encrypt via Traefik
|
|
- **Purpose:** Integration testing, QA, stakeholder demos
|
|
|
|
#### Production Environment
|
|
|
|
- **Location:** Hetzner VPS (CCX42) or Kubernetes (future)
|
|
- **Orchestration:** Docker Compose with zero-downtime deployments
|
|
- **Database:** Production Supabase projects (per-project isolation)
|
|
- **Domains:** `chat.manacore.app`, `api-chat.manacore.app`, etc.
|
|
- **SSL:** Let's Encrypt with auto-renewal
|
|
- **Purpose:** Live customer traffic
|
|
|
|
---
|
|
|
|
### 2. Deployment Regions
|
|
|
|
**Current Strategy:** Single-region deployment (Europe-West3)
|
|
|
|
**Multi-Region Expansion (Future):**
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ GLOBAL DEPLOYMENT │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ [US-East] [EU-West] [Asia-Pacific] │
|
|
│ Primary Primary Primary │
|
|
│ Replicas: 2 Replicas: 3 Replicas: 2 │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────┐ │
|
|
│ │ Cloudflare CDN (Global Edge) │ │
|
|
│ │ - Astro landing pages (cached) │ │
|
|
│ │ - Expo OTA bundles (cached) │ │
|
|
│ │ - API requests (proxied to nearest region) │ │
|
|
│ └─────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Database: Supabase (auto-replication across regions) │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
### 3. Blue-Green Deployment Strategy
|
|
|
|
**Concept:** Zero-downtime deployments by running two identical production environments.
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ BLUE-GREEN DEPLOYMENT │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ [Load Balancer / Coolify Proxy] │
|
|
│ ↓ │
|
|
│ ┌──────────────────┐ ┌──────────────────┐ │
|
|
│ │ BLUE (Live) │ │ GREEN (Standby) │ │
|
|
│ │ Version: 1.5.2 │ │ Version: 1.6.0 │ │
|
|
│ │ Traffic: 100% │ │ Traffic: 0% │ │
|
|
│ └──────────────────┘ └──────────────────┘ │
|
|
│ │
|
|
│ Deployment Steps: │
|
|
│ 1. Deploy new version to GREEN │
|
|
│ 2. Run smoke tests on GREEN │
|
|
│ 3. Switch 10% traffic to GREEN (canary) │
|
|
│ 4. Monitor metrics for 10 minutes │
|
|
│ 5. Switch 100% traffic to GREEN │
|
|
│ 6. Keep BLUE running for 1 hour (rollback window) │
|
|
│ 7. Decommission BLUE │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Rollback Procedure:**
|
|
|
|
```bash
|
|
# Instant rollback by switching traffic back to BLUE
|
|
coolify switch-deployment blue
|
|
|
|
# Or with Kubernetes
|
|
kubectl set image deployment/chat-backend chat-backend=registry.manacore.app/chat-backend:v1.5.2
|
|
```
|
|
|
|
**Database Migration Handling:**
|
|
|
|
- **Forward-compatible migrations only**: New code can read old schema
|
|
- **Two-phase migrations**:
|
|
1. Deploy schema changes (additive only)
|
|
2. Deploy code that uses new schema
|
|
3. Remove old columns in next release
|
|
|
|
---
|
|
|
|
### 4. Health Checks & Readiness Probes
|
|
|
|
**NestJS Health Check Endpoint:**
|
|
|
|
```typescript
|
|
// src/health/health.controller.ts
|
|
import { Controller, Get } from '@nestjs/common';
|
|
import { HealthCheck, HealthCheckService, TypeOrmHealthIndicator } from '@nestjs/terminus';
|
|
|
|
@Controller('api/health')
|
|
export class HealthController {
|
|
constructor(
|
|
private health: HealthCheckService,
|
|
private db: TypeOrmHealthIndicator,
|
|
) {}
|
|
|
|
@Get()
|
|
@HealthCheck()
|
|
check() {
|
|
return this.health.check([
|
|
() => this.db.pingCheck('database'),
|
|
]);
|
|
}
|
|
}
|
|
```
|
|
|
|
**SvelteKit Health Check Endpoint:**
|
|
|
|
```typescript
|
|
// src/routes/api/health/+server.ts
|
|
import type { RequestHandler } from './$types';
|
|
|
|
export const GET: RequestHandler = async () => {
|
|
return new Response('OK', {
|
|
status: 200,
|
|
headers: { 'Content-Type': 'text/plain' }
|
|
});
|
|
};
|
|
```
|
|
|
|
**Health Check Configuration:**
|
|
|
|
```yaml
|
|
# docker-compose.yml
|
|
healthcheck:
|
|
test: ["CMD", "wget", "--spider", "-q", "http://localhost:3002/api/health"]
|
|
interval: 30s # Check every 30 seconds
|
|
timeout: 10s # Fail if no response in 10s
|
|
retries: 3 # Mark unhealthy after 3 consecutive failures
|
|
start_period: 40s # Grace period for app startup
|
|
```
|
|
|
|
---
|
|
|
|
## Data Architecture
|
|
|
|
### 1. Database Strategy
|
|
|
|
#### Supabase Integration Pattern
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ SUPABASE MULTI-TENANCY │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ Separate Supabase Project per Product: │
|
|
│ │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
│ │ Chat DB │ │ Memoro DB │ │ Picture DB │ │
|
|
│ │ (Supabase) │ │ (Supabase) │ │ (Supabase) │ │
|
|
│ │ │ │ │ │ │ │
|
|
│ │ - messages │ │ - memos │ │ - images │ │
|
|
│ │ - threads │ │ - memories │ │ - prompts │ │
|
|
│ │ - models │ │ - blueprints │ │ - generations│ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
│ │
|
|
│ Shared Auth Database (Mana Core Auth): │
|
|
│ ┌──────────────────────────────────────┐ │
|
|
│ │ PostgreSQL (Docker/Cloud) │ │
|
|
│ │ - users │ │
|
|
│ │ - sessions │ │
|
|
│ │ - credits │ │
|
|
│ │ - subscriptions │ │
|
|
│ └──────────────────────────────────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Rationale for Separate Supabase Projects:**
|
|
|
|
- **Data isolation**: Security boundary per product
|
|
- **Independent scaling**: Each project has its own compute resources
|
|
- **Schema evolution**: Migrate databases independently
|
|
- **Billing transparency**: Track costs per product
|
|
- **RLS policies**: Easier to manage with per-project isolation
|
|
|
|
---
|
|
|
|
#### Connection Pooling
|
|
|
|
**Problem:** NestJS apps open many DB connections, exceeding Supabase limits (default: 60 connections).
|
|
|
|
**Solution:** PgBouncer connection pooler (Supabase built-in).
|
|
|
|
**Configuration:**
|
|
|
|
```typescript
|
|
// Backend connection string (transaction pooling)
|
|
DATABASE_URL=postgresql://user:pass@db.project.supabase.co:6543/postgres?pgbouncer=true
|
|
|
|
// For migrations (session pooling)
|
|
MIGRATION_DATABASE_URL=postgresql://user:pass@db.project.supabase.co:5432/postgres
|
|
```
|
|
|
|
**Docker Environment:**
|
|
|
|
```yaml
|
|
# docker-compose.prod.yml
|
|
environment:
|
|
DATABASE_URL: ${DATABASE_URL}?pgbouncer=true&connection_limit=10
|
|
```
|
|
|
|
**Connection Limits per Service:**
|
|
|
|
| Service Type | Max Connections | Pool Size | Rationale |
|
|
|--------------|----------------|-----------|-----------|
|
|
| NestJS Backend | 10 | 5 | API requests are short-lived |
|
|
| SvelteKit Web | 5 | 3 | SSR queries are quick |
|
|
| Migration Script | 1 | 1 | One-time operation |
|
|
|
|
---
|
|
|
|
### 2. Migration Workflow
|
|
|
|
**Environment Progression:**
|
|
|
|
```
|
|
Development → Staging → Production
|
|
↓ ↓ ↓
|
|
Local DB Staging DB Prod DB
|
|
```
|
|
|
|
**Migration Process:**
|
|
|
|
1. **Development:**
|
|
```bash
|
|
# Generate migration
|
|
pnpm --filter @chat/backend migration:generate --name add-user-preferences
|
|
|
|
# Apply migration locally
|
|
pnpm --filter @chat/backend migration:run
|
|
```
|
|
|
|
2. **Staging:**
|
|
```bash
|
|
# CI/CD pipeline applies migrations before deploying code
|
|
docker exec chat-backend pnpm migration:run
|
|
```
|
|
|
|
3. **Production:**
|
|
```bash
|
|
# Manual trigger (after staging validation)
|
|
kubectl exec -it chat-backend-pod -- pnpm migration:run
|
|
|
|
# Or automated (Coolify)
|
|
coolify deploy chat-backend --run-migrations
|
|
```
|
|
|
|
**Migration Safety Rules:**
|
|
|
|
- ✅ **Safe migrations** (can run while old code is live):
|
|
- Add new table
|
|
- Add new column (with default value)
|
|
- Add index (concurrent)
|
|
- Expand enum values
|
|
|
|
- ❌ **Unsafe migrations** (require blue-green deployment):
|
|
- Remove column
|
|
- Rename column
|
|
- Change column type
|
|
- Remove enum value
|
|
|
|
**Example Migration (Drizzle ORM):**
|
|
|
|
```typescript
|
|
// migrations/0001_add_user_preferences.ts
|
|
import { sql } from 'drizzle-orm';
|
|
import { pgTable, text, jsonb, timestamp } from 'drizzle-orm/pg-core';
|
|
|
|
export const userPreferences = pgTable('user_preferences', {
|
|
id: text('id').primaryKey(),
|
|
userId: text('user_id').notNull().references(() => users.id),
|
|
preferences: jsonb('preferences').notNull().default('{}'),
|
|
createdAt: timestamp('created_at').defaultNow(),
|
|
updatedAt: timestamp('updated_at').defaultNow(),
|
|
});
|
|
|
|
export async function up(db) {
|
|
await db.execute(sql`
|
|
CREATE TABLE user_preferences (
|
|
id TEXT PRIMARY KEY,
|
|
user_id TEXT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
|
|
preferences JSONB NOT NULL DEFAULT '{}',
|
|
created_at TIMESTAMPTZ DEFAULT NOW(),
|
|
updated_at TIMESTAMPTZ DEFAULT NOW()
|
|
);
|
|
CREATE INDEX idx_user_preferences_user_id ON user_preferences(user_id);
|
|
`);
|
|
}
|
|
|
|
export async function down(db) {
|
|
await db.execute(sql`DROP TABLE user_preferences;`);
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 3. Backup & Recovery Strategy
|
|
|
|
**Supabase Automatic Backups:**
|
|
|
|
- **Daily backups**: Retained for 7 days (Pro plan)
|
|
- **Point-in-time recovery**: Up to 7 days (Pro plan)
|
|
- **Geographic replication**: Multi-region redundancy
|
|
|
|
**Custom Backup Script:**
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# scripts/backup-db.sh
|
|
|
|
PROJECT_REF="your-project-ref"
|
|
BACKUP_DIR="/backups/$(date +%Y-%m-%d)"
|
|
|
|
# Create backup
|
|
pg_dump "$DATABASE_URL" \
|
|
--format=custom \
|
|
--compress=9 \
|
|
--file="$BACKUP_DIR/chat-db-$(date +%Y%m%d-%H%M%S).dump"
|
|
|
|
# Upload to S3/R2
|
|
aws s3 cp "$BACKUP_DIR" s3://manacore-backups/ --recursive
|
|
|
|
# Retain only last 30 days
|
|
find /backups -mtime +30 -delete
|
|
```
|
|
|
|
**Restore Procedure:**
|
|
|
|
```bash
|
|
# Download backup
|
|
aws s3 cp s3://manacore-backups/2025-11-27/chat-db-20251127-120000.dump ./
|
|
|
|
# Restore to database
|
|
pg_restore --clean --if-exists \
|
|
--dbname="$DATABASE_URL" \
|
|
./chat-db-20251127-120000.dump
|
|
```
|
|
|
|
**Disaster Recovery RPO/RTO:**
|
|
|
|
- **RPO (Recovery Point Objective)**: < 24 hours (daily backups)
|
|
- **RTO (Recovery Time Objective)**: < 1 hour (automated restore)
|
|
|
|
---
|
|
|
|
### 4. Redis Caching Strategy
|
|
|
|
**Use Cases:**
|
|
|
|
| Service | Cache Key Pattern | TTL | Purpose |
|
|
|---------|------------------|-----|---------|
|
|
| Mana Core Auth | `session:{sessionId}` | 7 days | JWT session storage |
|
|
| Mana Core Auth | `credits:{userId}` | 5 minutes | Credit balance cache |
|
|
| Chat Backend | `models:list` | 1 hour | AI model metadata |
|
|
| Picture Backend | `generations:{userId}:{day}` | 24 hours | Daily usage quota |
|
|
| Uload Backend | `url:{shortCode}` | Permanent | URL redirect cache |
|
|
|
|
**Redis Configuration:**
|
|
|
|
```yaml
|
|
# docker-compose.prod.yml
|
|
redis:
|
|
image: redis:7-alpine
|
|
command: >
|
|
redis-server
|
|
--requirepass ${REDIS_PASSWORD}
|
|
--maxmemory 512mb
|
|
--maxmemory-policy allkeys-lru
|
|
--appendonly yes
|
|
--appendfsync everysec
|
|
volumes:
|
|
- redis-data:/data
|
|
```
|
|
|
|
**Cache Invalidation Strategy:**
|
|
|
|
```typescript
|
|
// Example: Invalidate user credits cache on update
|
|
async updateCredits(userId: string, amount: number) {
|
|
await this.db.updateCredits(userId, amount);
|
|
await this.redis.del(`credits:${userId}`); // Invalidate cache
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Network Architecture
|
|
|
|
### 1. Domain & Subdomain Strategy
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ DOMAIN ARCHITECTURE │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ Root Domain: manacore.app │
|
|
│ │
|
|
│ Product Structure: │
|
|
│ ┌──────────────────────────────────────────────────┐ │
|
|
│ │ Landing (Astro) → chat.manacore.app │ │
|
|
│ │ Web App (Svelte) → app-chat.manacore.app │ │
|
|
│ │ API (NestJS) → api-chat.manacore.app │ │
|
|
│ │ Mobile (Expo) → N/A (native apps) │ │
|
|
│ └──────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Example: Chat Project │
|
|
│ - https://chat.manacore.app → Astro landing │
|
|
│ - https://app-chat.manacore.app → SvelteKit web app │
|
|
│ - https://api-chat.manacore.app → NestJS backend │
|
|
│ │
|
|
│ Infrastructure: │
|
|
│ - https://auth.manacore.app → Mana Core Auth │
|
|
│ - https://status.manacore.app → Status page (UptimeRobot)│
|
|
│ - https://docs.manacore.app → API documentation │
|
|
│ │
|
|
│ All domains: │
|
|
│ - SSL via Let's Encrypt (Coolify auto-provision) │
|
|
│ - HTTP/2 enabled │
|
|
│ - HSTS headers (max-age=31536000) │
|
|
│ - Cloudflare DNS (with proxy for DDoS protection) │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**DNS Records (Cloudflare):**
|
|
|
|
```
|
|
Type Name Target Proxy
|
|
─────────────────────────────────────────────────────────────────────
|
|
A chat.manacore.app 185.230.123.45 (Coolify IP) Yes
|
|
A app-chat.manacore.app 185.230.123.45 Yes
|
|
A api-chat.manacore.app 185.230.123.45 No*
|
|
CNAME *.manacore.app manacore.app Yes
|
|
|
|
* API endpoints should NOT be proxied through Cloudflare to avoid caching issues
|
|
```
|
|
|
|
---
|
|
|
|
### 2. SSL/TLS Certificate Management
|
|
|
|
**Coolify Automatic SSL:**
|
|
|
|
```yaml
|
|
# .coolify/settings.yml
|
|
ssl:
|
|
provider: letsencrypt
|
|
email: devops@manacore.app
|
|
staging: false # Use production Let's Encrypt
|
|
auto_renew: true
|
|
renewal_days_before: 30
|
|
```
|
|
|
|
**Manual SSL (Certbot):**
|
|
|
|
```bash
|
|
# Initial setup
|
|
certbot certonly --standalone \
|
|
-d chat.manacore.app \
|
|
-d api-chat.manacore.app \
|
|
--email devops@manacore.app \
|
|
--agree-tos
|
|
|
|
# Auto-renewal cron job
|
|
0 0 * * * certbot renew --quiet --post-hook "systemctl reload nginx"
|
|
```
|
|
|
|
**SSL Configuration (Nginx):**
|
|
|
|
```nginx
|
|
# /etc/nginx/sites-available/chat.manacore.app
|
|
server {
|
|
listen 443 ssl http2;
|
|
server_name chat.manacore.app;
|
|
|
|
ssl_certificate /etc/letsencrypt/live/chat.manacore.app/fullchain.pem;
|
|
ssl_certificate_key /etc/letsencrypt/live/chat.manacore.app/privkey.pem;
|
|
ssl_protocols TLSv1.2 TLSv1.3;
|
|
ssl_ciphers HIGH:!aNULL:!MD5;
|
|
ssl_prefer_server_ciphers on;
|
|
|
|
# HSTS
|
|
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
|
|
|
|
# Security headers
|
|
add_header X-Frame-Options "SAMEORIGIN" always;
|
|
add_header X-Content-Type-Options "nosniff" always;
|
|
add_header X-XSS-Protection "1; mode=block" always;
|
|
|
|
location / {
|
|
proxy_pass http://localhost:3100; # chat-web container
|
|
proxy_set_header Host $host;
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
|
proxy_set_header X-Forwarded-Proto $scheme;
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 3. API Gateway vs Direct Service Exposure
|
|
|
|
**Current Recommendation:** Direct service exposure (no API gateway initially).
|
|
|
|
**Rationale:**
|
|
|
|
- **Simplicity**: Each backend has its own domain
|
|
- **Low traffic volume**: Gateway overhead not justified yet
|
|
- **Independent scaling**: Services scale independently
|
|
- **Coolify routing**: Built-in reverse proxy handles routing
|
|
|
|
**Future API Gateway (Kong/Traefik) - When to Adopt:**
|
|
|
|
- Traffic > 10,000 req/min
|
|
- Need centralized rate limiting
|
|
- Require complex routing (A/B testing, canary deployments)
|
|
- Centralized authentication/authorization
|
|
|
|
**Example Kong Configuration (Future):**
|
|
|
|
```yaml
|
|
# kong.yml
|
|
_format_version: "3.0"
|
|
|
|
services:
|
|
- name: chat-backend
|
|
url: http://chat-backend:3002
|
|
routes:
|
|
- name: chat-api
|
|
paths:
|
|
- /api/chat
|
|
strip_path: true
|
|
plugins:
|
|
- name: rate-limiting
|
|
config:
|
|
minute: 100
|
|
- name: cors
|
|
config:
|
|
origins:
|
|
- https://app-chat.manacore.app
|
|
|
|
- name: picture-backend
|
|
url: http://picture-backend:3005
|
|
routes:
|
|
- name: picture-api
|
|
paths:
|
|
- /api/picture
|
|
```
|
|
|
|
---
|
|
|
|
### 4. CORS Configuration
|
|
|
|
**Backend CORS Setup (NestJS):**
|
|
|
|
```typescript
|
|
// src/main.ts
|
|
import { NestFactory } from '@nestjs/core';
|
|
import { AppModule } from './app.module';
|
|
|
|
async function bootstrap() {
|
|
const app = await NestFactory.create(AppModule);
|
|
|
|
app.enableCors({
|
|
origin: [
|
|
'https://app-chat.manacore.app', // Production web app
|
|
'https://chat.manacore.app', // Landing page
|
|
'http://localhost:5173', // Development web app
|
|
'http://localhost:3000', // Development landing
|
|
'capacitor://localhost', // Mobile app (Capacitor)
|
|
'ionic://localhost', // Mobile app (Ionic)
|
|
],
|
|
credentials: true,
|
|
methods: ['GET', 'POST', 'PUT', 'DELETE', 'PATCH', 'OPTIONS'],
|
|
allowedHeaders: ['Content-Type', 'Authorization', 'X-App-ID'],
|
|
});
|
|
|
|
await app.listen(3002);
|
|
}
|
|
bootstrap();
|
|
```
|
|
|
|
**Environment-Specific CORS:**
|
|
|
|
```typescript
|
|
// config/cors.config.ts
|
|
const allowedOrigins = {
|
|
development: ['http://localhost:*'],
|
|
staging: ['https://staging-*.manacore.app'],
|
|
production: ['https://*.manacore.app'],
|
|
};
|
|
|
|
export const getCorsOrigins = () => {
|
|
const env = process.env.NODE_ENV || 'development';
|
|
return allowedOrigins[env];
|
|
};
|
|
```
|
|
|
|
---
|
|
|
|
### 5. CDN for Static Assets
|
|
|
|
**Strategy:** Cloudflare CDN in front of Astro landing pages.
|
|
|
|
**Benefits:**
|
|
|
|
- **Global edge caching**: 275+ data centers worldwide
|
|
- **DDoS protection**: Automatic mitigation
|
|
- **Compression**: Brotli + Gzip
|
|
- **Image optimization**: Polish feature (WebP conversion)
|
|
- **Caching rules**: Configurable per path
|
|
|
|
**Cloudflare Page Rules:**
|
|
|
|
```
|
|
Rule 1: Cache Everything
|
|
URL: https://chat.manacore.app/*
|
|
Settings:
|
|
- Cache Level: Cache Everything
|
|
- Edge Cache TTL: 1 month
|
|
- Browser Cache TTL: 1 week
|
|
|
|
Rule 2: Bypass Cache for API
|
|
URL: https://api-chat.manacore.app/*
|
|
Settings:
|
|
- Cache Level: Bypass
|
|
|
|
Rule 3: Image Optimization
|
|
URL: https://chat.manacore.app/images/*
|
|
Settings:
|
|
- Polish: Lossless
|
|
- Mirage: On (lazy loading)
|
|
```
|
|
|
|
**Astro Build Configuration:**
|
|
|
|
```javascript
|
|
// astro.config.mjs
|
|
export default defineConfig({
|
|
output: 'static',
|
|
build: {
|
|
inlineStylesheets: 'auto',
|
|
assets: '_assets',
|
|
},
|
|
vite: {
|
|
build: {
|
|
rollupOptions: {
|
|
output: {
|
|
assetFileNames: 'assets/[name].[hash][extname]',
|
|
chunkFileNames: 'chunks/[name].[hash].js',
|
|
entryFileNames: 'entry/[name].[hash].js',
|
|
},
|
|
},
|
|
},
|
|
},
|
|
});
|
|
```
|
|
|
|
**Cache-Control Headers:**
|
|
|
|
```nginx
|
|
# Nginx config for Astro landing pages
|
|
location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg|woff|woff2)$ {
|
|
expires 1y;
|
|
add_header Cache-Control "public, immutable";
|
|
}
|
|
|
|
location ~* \.(html)$ {
|
|
expires 1h;
|
|
add_header Cache-Control "public, must-revalidate";
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Environment Configuration Matrix
|
|
|
|
### Service Environment Variables
|
|
|
|
| Service | Env Var | Development | Staging | Production | Secret |
|
|
|---------|---------|-------------|---------|------------|--------|
|
|
| **mana-core-auth** |
|
|
| | `PORT` | 3001 | 3001 | 3001 | No |
|
|
| | `DATABASE_URL` | `postgresql://localhost:5432/manacore` | `postgresql://staging-db/manacore` | `postgresql://prod-db/manacore` | Yes |
|
|
| | `REDIS_HOST` | localhost | redis | redis | No |
|
|
| | `JWT_PRIVATE_KEY` | (dev key) | (staging key) | (prod key) | Yes |
|
|
| | `STRIPE_SECRET_KEY` | `sk_test_...` | `sk_test_...` | `sk_live_...` | Yes |
|
|
| **chat-backend** |
|
|
| | `PORT` | 3002 | 3002 | 3002 | No |
|
|
| | `DATABASE_URL` | Supabase (dev) | Supabase (staging) | Supabase (prod) | Yes |
|
|
| | `AZURE_OPENAI_API_KEY` | (dev key) | (staging key) | (prod key) | Yes |
|
|
| | `MANA_CORE_AUTH_URL` | `http://localhost:3001` | `https://auth-staging.manacore.app` | `https://auth.manacore.app` | No |
|
|
| **chat-web** |
|
|
| | `PUBLIC_BACKEND_URL` | `http://localhost:3002` | `https://api-staging-chat.manacore.app` | `https://api-chat.manacore.app` | No |
|
|
| | `PUBLIC_SUPABASE_URL` | Supabase (dev) | Supabase (staging) | Supabase (prod) | No |
|
|
| | `PUBLIC_SUPABASE_ANON_KEY` | (dev anon key) | (staging anon key) | (prod anon key) | No |
|
|
|
|
**Secret Management:**
|
|
|
|
- **Development:** `.env.development` (committed to git)
|
|
- **Staging/Production:** Coolify secrets UI or Kubernetes secrets
|
|
|
|
```bash
|
|
# Coolify secret injection
|
|
coolify env set chat-backend \
|
|
AZURE_OPENAI_API_KEY=secret123 \
|
|
DATABASE_URL=postgresql://...
|
|
```
|
|
|
|
**Kubernetes Secrets:**
|
|
|
|
```yaml
|
|
# k8s/secrets.yaml
|
|
apiVersion: v1
|
|
kind: Secret
|
|
metadata:
|
|
name: chat-backend-secrets
|
|
namespace: manacore
|
|
type: Opaque
|
|
data:
|
|
database-url: cG9zdGdyZXNxbDovLy4uLg== # base64 encoded
|
|
azure-api-key: c2VjcmV0MTIz # base64 encoded
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring & Observability
|
|
|
|
### 1. Logging Aggregation
|
|
|
|
**Architecture:**
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ LOGGING PIPELINE │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ [Services] │
|
|
│ ↓ stdout/stderr │
|
|
│ [Docker Logs] │
|
|
│ ↓ Docker logging driver │
|
|
│ [Loki / ELK Stack] │
|
|
│ ↓ Aggregation & indexing │
|
|
│ [Grafana / Kibana] │
|
|
│ ↓ Visualization & alerts │
|
|
│ [On-call Engineer] │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Docker Logging Driver (Loki):**
|
|
|
|
```yaml
|
|
# docker-compose.prod.yml
|
|
x-logging: &default-logging
|
|
driver: loki
|
|
options:
|
|
loki-url: "http://loki:3100/loki/api/v1/push"
|
|
loki-batch-size: "400"
|
|
loki-retries: "3"
|
|
labels: "project,service,environment"
|
|
|
|
services:
|
|
chat-backend:
|
|
logging: *default-logging
|
|
labels:
|
|
logging.project: "chat"
|
|
logging.service: "backend"
|
|
logging.environment: "production"
|
|
```
|
|
|
|
**Structured Logging (NestJS):**
|
|
|
|
```typescript
|
|
// src/logging/logger.service.ts
|
|
import { Injectable, Logger as NestLogger } from '@nestjs/common';
|
|
|
|
@Injectable()
|
|
export class LoggerService extends NestLogger {
|
|
log(message: string, context?: string) {
|
|
super.log(JSON.stringify({
|
|
level: 'info',
|
|
timestamp: new Date().toISOString(),
|
|
context,
|
|
message,
|
|
environment: process.env.NODE_ENV,
|
|
service: 'chat-backend',
|
|
}));
|
|
}
|
|
|
|
error(message: string, trace?: string, context?: string) {
|
|
super.error(JSON.stringify({
|
|
level: 'error',
|
|
timestamp: new Date().toISOString(),
|
|
context,
|
|
message,
|
|
trace,
|
|
environment: process.env.NODE_ENV,
|
|
service: 'chat-backend',
|
|
}));
|
|
}
|
|
}
|
|
```
|
|
|
|
**Grafana Loki Query Examples:**
|
|
|
|
```logql
|
|
# All errors in last 1 hour
|
|
{project="chat", level="error"} |= "" | json | line_format "{{.message}}"
|
|
|
|
# High latency requests (>1s)
|
|
{service="backend"} | json | duration > 1s
|
|
|
|
# Failed database connections
|
|
{service="backend"} |~ "database connection failed"
|
|
```
|
|
|
|
---
|
|
|
|
### 2. Application Performance Monitoring (APM)
|
|
|
|
**Recommended Tool:** Sentry (error tracking) + New Relic / Datadog (APM)
|
|
|
|
**Sentry Integration (NestJS):**
|
|
|
|
```typescript
|
|
// src/main.ts
|
|
import * as Sentry from '@sentry/node';
|
|
|
|
Sentry.init({
|
|
dsn: process.env.SENTRY_DSN,
|
|
environment: process.env.NODE_ENV,
|
|
tracesSampleRate: 0.1, // 10% of transactions
|
|
integrations: [
|
|
new Sentry.Integrations.Http({ tracing: true }),
|
|
new Sentry.Integrations.Postgres(),
|
|
],
|
|
});
|
|
|
|
async function bootstrap() {
|
|
const app = await NestFactory.create(AppModule);
|
|
|
|
// Sentry request handler
|
|
app.use(Sentry.Handlers.requestHandler());
|
|
app.use(Sentry.Handlers.tracingHandler());
|
|
|
|
// ... app setup
|
|
|
|
// Sentry error handler
|
|
app.use(Sentry.Handlers.errorHandler());
|
|
|
|
await app.listen(3002);
|
|
}
|
|
```
|
|
|
|
**Metrics to Track:**
|
|
|
|
| Metric | Threshold | Action |
|
|
|--------|-----------|--------|
|
|
| API Response Time (p95) | > 500ms | Alert on-call |
|
|
| Error Rate | > 5% | Alert on-call |
|
|
| Database Query Time (p95) | > 200ms | Investigate slow queries |
|
|
| Memory Usage | > 80% | Scale up or investigate leak |
|
|
| CPU Usage | > 70% | Scale horizontally |
|
|
| Failed Logins | > 100/min | Potential attack, rate limit |
|
|
|
|
---
|
|
|
|
### 3. Metrics Collection (Prometheus + Grafana)
|
|
|
|
**Prometheus Exporter (NestJS):**
|
|
|
|
```typescript
|
|
// src/metrics/metrics.controller.ts
|
|
import { Controller, Get } from '@nestjs/common';
|
|
import { register, Counter, Histogram } from 'prom-client';
|
|
|
|
const httpRequestDuration = new Histogram({
|
|
name: 'http_request_duration_seconds',
|
|
help: 'Duration of HTTP requests in seconds',
|
|
labelNames: ['method', 'route', 'status_code'],
|
|
});
|
|
|
|
const httpRequestTotal = new Counter({
|
|
name: 'http_requests_total',
|
|
help: 'Total number of HTTP requests',
|
|
labelNames: ['method', 'route', 'status_code'],
|
|
});
|
|
|
|
@Controller()
|
|
export class MetricsController {
|
|
@Get('/metrics')
|
|
getMetrics() {
|
|
return register.metrics();
|
|
}
|
|
}
|
|
```
|
|
|
|
**Prometheus Scrape Config:**
|
|
|
|
```yaml
|
|
# prometheus.yml
|
|
scrape_configs:
|
|
- job_name: 'chat-backend'
|
|
static_configs:
|
|
- targets: ['chat-backend:3002']
|
|
metrics_path: '/metrics'
|
|
scrape_interval: 30s
|
|
|
|
- job_name: 'maerchenzauber-backend'
|
|
static_configs:
|
|
- targets: ['maerchenzauber-backend:3003']
|
|
```
|
|
|
|
**Grafana Dashboard:**
|
|
|
|
- **Dashboard 1: Service Health Overview**
|
|
- Request rate (req/sec)
|
|
- Error rate (%)
|
|
- Response time (p50, p95, p99)
|
|
- Active connections
|
|
|
|
- **Dashboard 2: Database Performance**
|
|
- Query duration
|
|
- Connection pool usage
|
|
- Slow queries (>100ms)
|
|
|
|
- **Dashboard 3: Resource Utilization**
|
|
- CPU usage
|
|
- Memory usage
|
|
- Disk I/O
|
|
- Network traffic
|
|
|
|
---
|
|
|
|
### 4. Alert Thresholds
|
|
|
|
**Alert Configuration (Prometheus Alertmanager):**
|
|
|
|
```yaml
|
|
# alertmanager.yml
|
|
groups:
|
|
- name: critical_alerts
|
|
interval: 1m
|
|
rules:
|
|
- alert: HighErrorRate
|
|
expr: rate(http_requests_total{status_code=~"5.."}[5m]) > 0.05
|
|
for: 5m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "High error rate detected (>5%)"
|
|
description: "Service {{ $labels.service }} has error rate {{ $value }}"
|
|
|
|
- alert: HighResponseTime
|
|
expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 0.5
|
|
for: 10m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High response time (p95 >500ms)"
|
|
|
|
- alert: DatabaseConnectionPoolExhausted
|
|
expr: pg_pool_available_connections < 2
|
|
for: 2m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Database connection pool almost exhausted"
|
|
|
|
- alert: HighMemoryUsage
|
|
expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.8
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Container memory usage >80%"
|
|
```
|
|
|
|
**Alert Routing:**
|
|
|
|
```yaml
|
|
# alertmanager.yml
|
|
route:
|
|
receiver: 'default'
|
|
group_by: ['alertname', 'service']
|
|
group_wait: 10s
|
|
group_interval: 10s
|
|
repeat_interval: 12h
|
|
routes:
|
|
- match:
|
|
severity: critical
|
|
receiver: 'pagerduty'
|
|
- match:
|
|
severity: warning
|
|
receiver: 'slack'
|
|
|
|
receivers:
|
|
- name: 'pagerduty'
|
|
pagerduty_configs:
|
|
- service_key: '<pagerduty-service-key>'
|
|
|
|
- name: 'slack'
|
|
slack_configs:
|
|
- api_url: '<slack-webhook-url>'
|
|
channel: '#alerts'
|
|
```
|
|
|
|
---
|
|
|
|
## CI/CD Pipeline
|
|
|
|
### GitHub Actions Workflow
|
|
|
|
**File:** `.github/workflows/deploy-chat.yml`
|
|
|
|
```yaml
|
|
name: Deploy Chat Project
|
|
|
|
on:
|
|
push:
|
|
branches: [main]
|
|
paths:
|
|
- 'apps/chat/**'
|
|
- 'packages/shared-*/**'
|
|
- '.github/workflows/deploy-chat.yml'
|
|
pull_request:
|
|
branches: [main]
|
|
paths:
|
|
- 'apps/chat/**'
|
|
|
|
env:
|
|
REGISTRY: ghcr.io
|
|
IMAGE_PREFIX: manacore
|
|
|
|
jobs:
|
|
# ============================================================================
|
|
# Job 1: Lint & Type Check
|
|
# ============================================================================
|
|
|
|
lint-and-typecheck:
|
|
name: Lint & Type Check
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
|
|
- name: Setup pnpm
|
|
uses: pnpm/action-setup@v2
|
|
with:
|
|
version: 9.15.0
|
|
|
|
- name: Setup Node.js
|
|
uses: actions/setup-node@v4
|
|
with:
|
|
node-version: '20'
|
|
cache: 'pnpm'
|
|
|
|
- name: Install dependencies
|
|
run: pnpm install --frozen-lockfile
|
|
|
|
- name: Build shared packages
|
|
run: pnpm --filter '@manacore/shared-*' build
|
|
|
|
- name: Lint chat backend
|
|
run: pnpm --filter @chat/backend lint
|
|
|
|
- name: Type check chat backend
|
|
run: pnpm --filter @chat/backend type-check
|
|
|
|
- name: Lint chat web
|
|
run: pnpm --filter @chat/web lint
|
|
|
|
- name: Type check chat web
|
|
run: pnpm --filter @chat/web type-check
|
|
|
|
# ============================================================================
|
|
# Job 2: Build & Push Docker Images
|
|
# ============================================================================
|
|
|
|
build-and-push:
|
|
name: Build Docker Images
|
|
runs-on: ubuntu-latest
|
|
needs: lint-and-typecheck
|
|
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
|
|
strategy:
|
|
matrix:
|
|
service:
|
|
- { name: chat-backend, path: apps/chat/apps/backend, port: 3002 }
|
|
- { name: chat-web, path: apps/chat/apps/web, port: 3000 }
|
|
- { name: chat-landing, path: apps/chat/apps/landing, port: 80 }
|
|
|
|
permissions:
|
|
contents: read
|
|
packages: write
|
|
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
|
|
- name: Set up Docker Buildx
|
|
uses: docker/setup-buildx-action@v3
|
|
|
|
- name: Log in to GitHub Container Registry
|
|
uses: docker/login-action@v3
|
|
with:
|
|
registry: ${{ env.REGISTRY }}
|
|
username: ${{ github.actor }}
|
|
password: ${{ secrets.GITHUB_TOKEN }}
|
|
|
|
- name: Extract metadata
|
|
id: meta
|
|
uses: docker/metadata-action@v5
|
|
with:
|
|
images: ${{ env.REGISTRY }}/${{ env.IMAGE_PREFIX }}/${{ matrix.service.name }}
|
|
tags: |
|
|
type=ref,event=branch
|
|
type=ref,event=pr
|
|
type=semver,pattern={{version}}
|
|
type=semver,pattern={{major}}.{{minor}}
|
|
type=sha,prefix={{branch}}-
|
|
type=raw,value=latest,enable={{is_default_branch}}
|
|
|
|
- name: Determine Dockerfile
|
|
id: dockerfile
|
|
run: |
|
|
if [[ "${{ matrix.service.name }}" == *-backend ]]; then
|
|
echo "dockerfile=docker/templates/Dockerfile.nestjs" >> $GITHUB_OUTPUT
|
|
elif [[ "${{ matrix.service.name }}" == *-web ]]; then
|
|
echo "dockerfile=docker/templates/Dockerfile.sveltekit" >> $GITHUB_OUTPUT
|
|
elif [[ "${{ matrix.service.name }}" == *-landing ]]; then
|
|
echo "dockerfile=docker/templates/Dockerfile.astro" >> $GITHUB_OUTPUT
|
|
fi
|
|
|
|
- name: Build and push Docker image
|
|
uses: docker/build-push-action@v5
|
|
with:
|
|
context: .
|
|
file: ${{ steps.dockerfile.outputs.dockerfile }}
|
|
build-args: |
|
|
PROJECT_PATH=${{ matrix.service.path }}
|
|
PORT=${{ matrix.service.port }}
|
|
push: true
|
|
tags: ${{ steps.meta.outputs.tags }}
|
|
labels: ${{ steps.meta.outputs.labels }}
|
|
cache-from: type=gha
|
|
cache-to: type=gha,mode=max
|
|
|
|
# ============================================================================
|
|
# Job 3: Deploy to Staging
|
|
# ============================================================================
|
|
|
|
deploy-staging:
|
|
name: Deploy to Staging
|
|
runs-on: ubuntu-latest
|
|
needs: build-and-push
|
|
environment:
|
|
name: staging
|
|
url: https://staging-chat.manacore.app
|
|
|
|
steps:
|
|
- name: Deploy to Coolify (Staging)
|
|
uses: appleboy/ssh-action@v1.0.0
|
|
with:
|
|
host: ${{ secrets.COOLIFY_STAGING_HOST }}
|
|
username: ${{ secrets.COOLIFY_SSH_USER }}
|
|
key: ${{ secrets.COOLIFY_SSH_KEY }}
|
|
script: |
|
|
cd /var/lib/coolify/apps/chat-staging
|
|
docker compose pull
|
|
docker compose up -d --force-recreate
|
|
docker compose exec -T chat-backend pnpm migration:run
|
|
|
|
- name: Health check (Staging)
|
|
run: |
|
|
curl -f https://api-staging-chat.manacore.app/api/health || exit 1
|
|
|
|
# ============================================================================
|
|
# Job 4: Deploy to Production (Manual Approval)
|
|
# ============================================================================
|
|
|
|
deploy-production:
|
|
name: Deploy to Production
|
|
runs-on: ubuntu-latest
|
|
needs: deploy-staging
|
|
environment:
|
|
name: production
|
|
url: https://chat.manacore.app
|
|
|
|
steps:
|
|
- name: Deploy to Coolify (Production)
|
|
uses: appleboy/ssh-action@v1.0.0
|
|
with:
|
|
host: ${{ secrets.COOLIFY_PROD_HOST }}
|
|
username: ${{ secrets.COOLIFY_SSH_USER }}
|
|
key: ${{ secrets.COOLIFY_SSH_KEY }}
|
|
script: |
|
|
cd /var/lib/coolify/apps/chat-production
|
|
|
|
# Blue-green deployment: Deploy to green environment
|
|
docker compose -f docker-compose.green.yml pull
|
|
docker compose -f docker-compose.green.yml up -d --force-recreate
|
|
|
|
# Wait for health check
|
|
sleep 10
|
|
|
|
# Run migrations on green
|
|
docker compose -f docker-compose.green.yml exec -T chat-backend pnpm migration:run
|
|
|
|
# Health check green environment
|
|
curl -f http://localhost:3002/api/health || exit 1
|
|
|
|
# Switch traffic to green (update Coolify routing)
|
|
coolify switch-deployment chat green
|
|
|
|
# Keep blue running for 1 hour (rollback window)
|
|
# Decommission blue after validation
|
|
|
|
- name: Health check (Production)
|
|
run: |
|
|
curl -f https://api-chat.manacore.app/api/health || exit 1
|
|
|
|
- name: Smoke tests
|
|
run: |
|
|
# Basic API tests
|
|
curl -X POST https://api-chat.manacore.app/api/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'
|
|
```
|
|
|
|
**Matrix Strategy for All Projects:**
|
|
|
|
```yaml
|
|
# .github/workflows/deploy-all.yml
|
|
strategy:
|
|
matrix:
|
|
project:
|
|
- chat
|
|
- maerchenzauber
|
|
- manadeck
|
|
- memoro
|
|
- picture
|
|
- uload
|
|
- nutriphi
|
|
- news
|
|
- manacore
|
|
```
|
|
|
|
---
|
|
|
|
## Disaster Recovery
|
|
|
|
### 1. Backup Strategy
|
|
|
|
**What to Backup:**
|
|
|
|
- ✅ **PostgreSQL databases** (Supabase auto-backup + manual pg_dump)
|
|
- ✅ **Redis data** (AOF persistence enabled)
|
|
- ✅ **Docker volumes** (application state, logs)
|
|
- ✅ **Environment variables** (encrypted secrets backup)
|
|
- ✅ **SSL certificates** (Let's Encrypt certs)
|
|
- ❌ **Docker images** (rebuild from source)
|
|
- ❌ **Build artifacts** (regenerate from CI/CD)
|
|
|
|
**Backup Schedule:**
|
|
|
|
| Asset | Frequency | Retention | Storage |
|
|
|-------|-----------|-----------|---------|
|
|
| PostgreSQL | Daily (3 AM UTC) | 30 days | Cloudflare R2 |
|
|
| Redis | Daily (4 AM UTC) | 7 days | Cloudflare R2 |
|
|
| Environment Configs | On change | Indefinite | Git (encrypted) |
|
|
| SSL Certs | Weekly | 90 days | Encrypted backup |
|
|
|
|
**Automated Backup Script:**
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# scripts/backup-all.sh
|
|
|
|
set -e
|
|
|
|
BACKUP_DIR="/backups/$(date +%Y/%m/%d)"
|
|
S3_BUCKET="s3://manacore-backups"
|
|
|
|
mkdir -p "$BACKUP_DIR"
|
|
|
|
# Backup all databases
|
|
for db in manacore chat maerchenzauber manadeck picture nutriphi; do
|
|
echo "Backing up database: $db"
|
|
pg_dump "$DATABASE_URL/$db" \
|
|
--format=custom \
|
|
--compress=9 \
|
|
--file="$BACKUP_DIR/$db-$(date +%Y%m%d-%H%M%S).dump"
|
|
done
|
|
|
|
# Backup Redis
|
|
echo "Backing up Redis"
|
|
redis-cli --rdb "$BACKUP_DIR/redis-$(date +%Y%m%d-%H%M%S).rdb"
|
|
|
|
# Upload to S3 (Cloudflare R2)
|
|
aws s3 sync "$BACKUP_DIR" "$S3_BUCKET/$(date +%Y/%m/%d)" \
|
|
--endpoint-url https://your-account-id.r2.cloudflarestorage.com
|
|
|
|
# Cleanup local backups older than 7 days
|
|
find /backups -type d -mtime +7 -exec rm -rf {} +
|
|
|
|
echo "Backup completed successfully"
|
|
```
|
|
|
|
**Cron Job:**
|
|
|
|
```cron
|
|
# Run backup daily at 3 AM UTC
|
|
0 3 * * * /opt/manacore/scripts/backup-all.sh >> /var/log/manacore-backup.log 2>&1
|
|
```
|
|
|
|
---
|
|
|
|
### 2. Recovery Procedures
|
|
|
|
#### Scenario 1: Database Corruption
|
|
|
|
```bash
|
|
# 1. Stop application
|
|
docker compose stop chat-backend
|
|
|
|
# 2. Download latest backup
|
|
aws s3 cp s3://manacore-backups/2025/11/27/chat-20251127-030000.dump ./
|
|
|
|
# 3. Drop corrupted database
|
|
psql -U manacore -c "DROP DATABASE chat;"
|
|
psql -U manacore -c "CREATE DATABASE chat;"
|
|
|
|
# 4. Restore from backup
|
|
pg_restore --dbname="postgresql://manacore:pass@localhost/chat" \
|
|
--clean --if-exists \
|
|
./chat-20251127-030000.dump
|
|
|
|
# 5. Restart application
|
|
docker compose start chat-backend
|
|
|
|
# 6. Verify health
|
|
curl -f https://api-chat.manacore.app/api/health
|
|
```
|
|
|
|
**RTO:** ~15 minutes
|
|
**RPO:** < 24 hours (last daily backup)
|
|
|
|
---
|
|
|
|
#### Scenario 2: Complete Server Failure
|
|
|
|
```bash
|
|
# 1. Provision new server (same specs)
|
|
# 2. Install Docker + Coolify
|
|
curl -fsSL https://cdn.coollabs.io/coolify/install.sh | bash
|
|
|
|
# 3. Clone repository
|
|
git clone https://github.com/manacore/manacore-monorepo.git
|
|
cd manacore-monorepo
|
|
|
|
# 4. Restore environment variables (from encrypted backup)
|
|
gpg --decrypt secrets-backup.gpg > .env.production
|
|
|
|
# 5. Restore databases
|
|
./scripts/restore-all-databases.sh
|
|
|
|
# 6. Deploy all services
|
|
docker compose -f docker-compose.prod.yml up -d
|
|
|
|
# 7. Update DNS records (point to new server IP)
|
|
# 8. Verify all services healthy
|
|
```
|
|
|
|
**RTO:** ~2 hours
|
|
**RPO:** < 24 hours
|
|
|
|
---
|
|
|
|
#### Scenario 3: Accidental Data Deletion
|
|
|
|
**Example:** User accidentally deleted critical records.
|
|
|
|
```bash
|
|
# 1. Identify time of deletion
|
|
# 2. Find latest backup BEFORE deletion
|
|
aws s3 ls s3://manacore-backups/2025/11/27/
|
|
|
|
# 3. Restore to temporary database
|
|
pg_restore --dbname="postgresql://localhost/chat_temp" \
|
|
./chat-20251127-120000.dump
|
|
|
|
# 4. Extract deleted records
|
|
psql -U manacore chat_temp -c \
|
|
"COPY (SELECT * FROM messages WHERE id IN ('uuid1','uuid2')) TO STDOUT" \
|
|
> deleted_records.csv
|
|
|
|
# 5. Import to production database
|
|
psql -U manacore chat -c \
|
|
"COPY messages FROM STDIN CSV" < deleted_records.csv
|
|
|
|
# 6. Verify restoration
|
|
psql -U manacore chat -c \
|
|
"SELECT * FROM messages WHERE id IN ('uuid1','uuid2')"
|
|
```
|
|
|
|
---
|
|
|
|
### 3. Failover Strategies
|
|
|
|
#### Active-Passive (Current)
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ ACTIVE-PASSIVE FAILOVER │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ [Primary Server - EU-West] │
|
|
│ ┌────────────────────────────┐ │
|
|
│ │ Chat Backend (Active) │ │
|
|
│ │ Picture Backend (Active) │ │
|
|
│ │ All Web Apps (Active) │ │
|
|
│ └────────────────────────────┘ │
|
|
│ │
|
|
│ [Standby Server - US-East] (Cold Standby) │
|
|
│ ┌────────────────────────────┐ │
|
|
│ │ Services: Stopped │ │
|
|
│ │ Disk: Daily backup sync │ │
|
|
│ │ Activation: Manual │ │
|
|
│ └────────────────────────────┘ │
|
|
│ │
|
|
│ Failover Time: ~2 hours (manual) │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Failover Trigger:**
|
|
1. Primary server down > 30 minutes
|
|
2. Health checks fail > 10 consecutive times
|
|
3. Network unreachable
|
|
|
|
**Manual Failover Steps:**
|
|
```bash
|
|
# 1. Verify primary is down
|
|
curl -f https://api-chat.manacore.app/api/health
|
|
|
|
# 2. Activate standby server
|
|
ssh standby-server "docker compose -f docker-compose.prod.yml up -d"
|
|
|
|
# 3. Update DNS (short TTL)
|
|
# A record: chat.manacore.app → standby-server-ip
|
|
|
|
# 4. Wait for DNS propagation (~5 minutes with TTL=300)
|
|
|
|
# 5. Verify all services healthy on standby
|
|
./scripts/health-check-all.sh
|
|
```
|
|
|
|
---
|
|
|
|
#### Active-Active (Future)
|
|
|
|
**Multi-region setup with load balancing:**
|
|
|
|
```
|
|
[Cloudflare Load Balancer]
|
|
↓
|
|
┌────┴────┐
|
|
↓ ↓
|
|
[EU-West] [US-East]
|
|
Chat-1 Chat-2
|
|
Picture-1 Picture-2
|
|
```
|
|
|
|
**Benefits:**
|
|
- Zero-downtime failover (automatic)
|
|
- Geographic load distribution
|
|
- Better performance for global users
|
|
|
|
**Challenges:**
|
|
- Database replication complexity
|
|
- Session state synchronization
|
|
- 2x infrastructure cost
|
|
|
|
---
|
|
|
|
## Security Hardening
|
|
|
|
### 1. Container Security
|
|
|
|
```dockerfile
|
|
# Security best practices in Dockerfile
|
|
|
|
# 1. Non-root user
|
|
RUN addgroup -g 1001 nodejs && adduser -u 1001 -G nodejs -s /bin/sh -D nodejs
|
|
USER nodejs
|
|
|
|
# 2. Read-only root filesystem
|
|
# (configured in docker-compose.yml)
|
|
|
|
# 3. Minimal base image
|
|
FROM node:20-alpine # Not node:20 (Debian)
|
|
|
|
# 4. No unnecessary packages
|
|
RUN apk add --no-cache postgresql-client wget
|
|
# Avoid: apt-get install curl git vim ...
|
|
|
|
# 5. Scan for vulnerabilities
|
|
# Run: trivy image chat-backend:latest
|
|
```
|
|
|
|
**Docker Compose Security:**
|
|
|
|
```yaml
|
|
services:
|
|
chat-backend:
|
|
security_opt:
|
|
- no-new-privileges:true
|
|
read_only: true
|
|
tmpfs:
|
|
- /tmp:noexec,nosuid,size=100m
|
|
cap_drop:
|
|
- ALL
|
|
cap_add:
|
|
- NET_BIND_SERVICE
|
|
```
|
|
|
|
---
|
|
|
|
### 2. Network Security
|
|
|
|
**Firewall Rules (iptables/ufw):**
|
|
|
|
```bash
|
|
# Allow only necessary ports
|
|
ufw default deny incoming
|
|
ufw default allow outgoing
|
|
ufw allow 22/tcp # SSH
|
|
ufw allow 80/tcp # HTTP
|
|
ufw allow 443/tcp # HTTPS
|
|
ufw enable
|
|
|
|
# Block direct access to backend ports (only via reverse proxy)
|
|
ufw deny 3001:3100/tcp
|
|
```
|
|
|
|
**Docker Network Isolation:**
|
|
|
|
```yaml
|
|
networks:
|
|
frontend:
|
|
driver: bridge
|
|
backend:
|
|
driver: bridge
|
|
internal: true # No external access
|
|
|
|
services:
|
|
chat-web:
|
|
networks:
|
|
- frontend
|
|
- backend
|
|
|
|
chat-backend:
|
|
networks:
|
|
- backend # Not exposed to internet
|
|
|
|
postgres:
|
|
networks:
|
|
- backend # Internal only
|
|
```
|
|
|
|
---
|
|
|
|
### 3. Secrets Management
|
|
|
|
**Current:** Coolify environment variables UI (encrypted at rest)
|
|
|
|
**Future:** HashiCorp Vault or AWS Secrets Manager
|
|
|
|
**Vault Integration Example:**
|
|
|
|
```typescript
|
|
// src/config/vault.config.ts
|
|
import * as vault from 'node-vault';
|
|
|
|
const vaultClient = vault({
|
|
endpoint: process.env.VAULT_ADDR,
|
|
token: process.env.VAULT_TOKEN,
|
|
});
|
|
|
|
export async function getSecret(path: string) {
|
|
const result = await vaultClient.read(path);
|
|
return result.data;
|
|
}
|
|
|
|
// Usage
|
|
const dbPassword = await getSecret('secret/database/chat-backend');
|
|
```
|
|
|
|
---
|
|
|
|
### 4. Rate Limiting
|
|
|
|
**NestJS Throttler:**
|
|
|
|
```typescript
|
|
// src/app.module.ts
|
|
import { ThrottlerModule } from '@nestjs/throttler';
|
|
|
|
@Module({
|
|
imports: [
|
|
ThrottlerModule.forRoot({
|
|
ttl: 60, // Time window (seconds)
|
|
limit: 100, // Max requests per window
|
|
}),
|
|
],
|
|
})
|
|
export class AppModule {}
|
|
```
|
|
|
|
**Nginx Rate Limiting:**
|
|
|
|
```nginx
|
|
# /etc/nginx/nginx.conf
|
|
http {
|
|
limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r/s;
|
|
|
|
server {
|
|
location /api/ {
|
|
limit_req zone=api_limit burst=20 nodelay;
|
|
proxy_pass http://backend;
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### 5. Security Headers
|
|
|
|
```typescript
|
|
// src/main.ts (NestJS)
|
|
import helmet from 'helmet';
|
|
|
|
app.use(helmet({
|
|
contentSecurityPolicy: {
|
|
directives: {
|
|
defaultSrc: ["'self'"],
|
|
scriptSrc: ["'self'", "'unsafe-inline'"],
|
|
styleSrc: ["'self'", "'unsafe-inline'"],
|
|
imgSrc: ["'self'", "data:", "https:"],
|
|
},
|
|
},
|
|
hsts: {
|
|
maxAge: 31536000,
|
|
includeSubDomains: true,
|
|
preload: true,
|
|
},
|
|
}));
|
|
```
|
|
|
|
**HTTP Headers:**
|
|
|
|
```
|
|
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
|
|
X-Frame-Options: SAMEORIGIN
|
|
X-Content-Type-Options: nosniff
|
|
X-XSS-Protection: 1; mode=block
|
|
Referrer-Policy: strict-origin-when-cross-origin
|
|
Permissions-Policy: geolocation=(), microphone=(), camera=()
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Roadmap
|
|
|
|
### Phase 1: Foundation (Week 1-2)
|
|
|
|
- [ ] Create Dockerfile templates (NestJS, SvelteKit, Astro)
|
|
- [ ] Enhance `docker-compose.dev.yml` with all projects
|
|
- [ ] Set up shared PostgreSQL + Redis containers
|
|
- [ ] Test local development workflow
|
|
- [ ] Document environment variable mapping
|
|
|
|
### Phase 2: CI/CD (Week 3-4)
|
|
|
|
- [ ] Set up GitHub Actions workflows (per project)
|
|
- [ ] Configure Docker image registry (GitHub Container Registry)
|
|
- [ ] Implement automated testing in CI
|
|
- [ ] Set up staging environment on Coolify
|
|
- [ ] Implement blue-green deployment scripts
|
|
|
|
### Phase 3: Production Deployment (Week 5-6)
|
|
|
|
- [ ] Deploy `mana-core-auth` to production
|
|
- [ ] Deploy first project (chat) end-to-end
|
|
- [ ] Set up monitoring (Prometheus + Grafana)
|
|
- [ ] Configure alerting (PagerDuty + Slack)
|
|
- [ ] Implement automated backups
|
|
|
|
### Phase 4: Rollout (Week 7-8)
|
|
|
|
- [ ] Deploy remaining 8 projects
|
|
- [ ] Set up CDN for Astro landing pages
|
|
- [ ] Configure DNS and SSL for all domains
|
|
- [ ] Load testing and performance optimization
|
|
- [ ] Documentation and runbooks
|
|
|
|
### Phase 5: Optimization (Week 9-10)
|
|
|
|
- [ ] Implement caching strategies (Redis)
|
|
- [ ] Set up APM (Sentry + New Relic)
|
|
- [ ] Security audit and penetration testing
|
|
- [ ] Disaster recovery drills
|
|
- [ ] Team training on deployment procedures
|
|
|
|
---
|
|
|
|
## Appendix
|
|
|
|
### A. Port Allocation Matrix
|
|
|
|
| Service | Dev Port | Staging Port | Prod Port | Protocol |
|
|
|---------|----------|--------------|-----------|----------|
|
|
| mana-core-auth | 3001 | 3001 | 3001 | HTTP |
|
|
| chat-backend | 3002 | 3002 | 3002 | HTTP |
|
|
| chat-web | 3100 | 3100 | 3100 | HTTP |
|
|
| chat-landing | 3200 | 3200 | 3200 | HTTP |
|
|
| maerchenzauber-backend | 3003 | 3003 | 3003 | HTTP |
|
|
| maerchenzauber-web | 3110 | 3110 | 3110 | HTTP |
|
|
| maerchenzauber-landing | 3210 | 3210 | 3210 | HTTP |
|
|
| picture-backend | 3005 | 3005 | 3005 | HTTP |
|
|
| picture-web | 3150 | 3150 | 3150 | HTTP |
|
|
| PostgreSQL | 5432 | 5432 | N/A (Supabase) | TCP |
|
|
| Redis | 6379 | 6379 | 6379 | TCP |
|
|
|
|
### B. Resource Requirements
|
|
|
|
**Per Service (Minimum):**
|
|
|
|
| Service Type | CPU | Memory | Disk |
|
|
|--------------|-----|--------|------|
|
|
| NestJS Backend | 0.5 vCPU | 512 MB | 1 GB |
|
|
| SvelteKit Web | 0.25 vCPU | 256 MB | 500 MB |
|
|
| Astro Landing (Nginx) | 0.1 vCPU | 128 MB | 100 MB |
|
|
| PostgreSQL | 1 vCPU | 2 GB | 50 GB |
|
|
| Redis | 0.25 vCPU | 256 MB | 5 GB |
|
|
|
|
**Total Infrastructure (Production):**
|
|
|
|
- **CPU:** ~15 vCPU
|
|
- **Memory:** ~15 GB
|
|
- **Disk:** ~100 GB (excluding databases)
|
|
- **Estimated Monthly Cost:** $150-$300 (single server) or $500-$800 (multi-region)
|
|
|
|
### C. Useful Commands Reference
|
|
|
|
```bash
|
|
# Build all Docker images
|
|
./scripts/build-all-images.sh
|
|
|
|
# Deploy specific project
|
|
docker compose --profile chat up -d
|
|
|
|
# View logs
|
|
docker compose logs -f chat-backend
|
|
|
|
# Health check all services
|
|
./scripts/health-check-all.sh
|
|
|
|
# Backup all databases
|
|
./scripts/backup-all.sh
|
|
|
|
# Restore database
|
|
./scripts/restore-db.sh chat 2025-11-27
|
|
|
|
# Rollback deployment
|
|
./scripts/rollback.sh chat v1.5.2
|
|
|
|
# Scale service
|
|
docker compose up -d --scale chat-backend=3
|
|
```
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
This deployment architecture provides:
|
|
|
|
- **Scalability:** Horizontal scaling per service
|
|
- **Reliability:** Blue-green deployments with instant rollback
|
|
- **Security:** Non-root containers, read-only filesystems, secrets management
|
|
- **Observability:** Comprehensive logging, metrics, and alerting
|
|
- **Disaster Recovery:** Automated backups with <1 hour RTO
|
|
- **Developer Experience:** Local Docker Compose mirrors production
|
|
- **Cost Efficiency:** Shared infrastructure (PostgreSQL, Redis) reduces overhead
|
|
|
|
**Next Steps:**
|
|
|
|
1. Review this architecture with the team
|
|
2. Prioritize Phase 1 implementation
|
|
3. Create Dockerfiles for all services
|
|
4. Set up CI/CD pipelines
|
|
5. Deploy to staging environment
|
|
|
|
**Questions or Feedback:** Contact the DevOps team or create an issue in the monorepo.
|
|
|
|
---
|
|
|
|
**Document Version:** 1.0
|
|
**Last Updated:** 2025-11-27
|
|
**Maintained By:** Hive Mind Swarm - Analyst Agent
|