mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 19:01:08 +02:00
- Add comprehensive guide in .claude/guidelines/sveltekit-web.md about build-time vs runtime environment variables for Docker deployments - Document the correct hooks.server.ts + window injection pattern - Add Problem 9 to TROUBLESHOOTING.md for hardcoded localhost URLs - List which apps are fixed vs need fixing - Add checklist for fixing new apps 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1165 lines
35 KiB
Markdown
1165 lines
35 KiB
Markdown
# Troubleshooting Guide
|
|
|
|
Common issues and solutions for the manacore-monorepo.
|
|
|
|
## Table of Contents
|
|
|
|
- [Recursive Turbo Calls](#recursive-turbo-calls)
|
|
- [Build Issues](#build-issues)
|
|
- [Linting Issues](#linting-issues)
|
|
- [NestJS Dependency Injection](#nestjs-dependency-injection)
|
|
- [Staging Deployment Issues](#staging-deployment-issues)
|
|
- [GitHub Running Disabled Workflows](#problem-1-github-running-disabled-workflows)
|
|
- [chat-backend Container Unhealthy](#problem-2-chat-backend-container-unhealthy)
|
|
- [SvelteKit Static Environment Variable Imports](#problem-3-sveltekit-static-environment-variable-imports)
|
|
- [Orphan Docker Containers](#problem-4-orphan-docker-containers)
|
|
- [Client-Side Calling localhost Instead of Public IP](#problem-5-client-side-calling-localhost-instead-of-public-ip)
|
|
- [CORS Blocking Cross-Origin Requests](#problem-6-cors-blocking-cross-origin-requests)
|
|
- [Missing Database Schema](#problem-7-missing-database-schema)
|
|
- [pnpm Symlinks Broken in Docker Container](#problem-8-pnpm-symlinks-broken-in-docker-container)
|
|
- [Hardcoded localhost URLs in SvelteKit Web Apps](#problem-9-hardcoded-localhost-urls-in-sveltekit-web-apps)
|
|
|
|
---
|
|
|
|
## Recursive Turbo Calls
|
|
|
|
### Problem: Infinite Loop / Tasks Running Forever
|
|
|
|
**Symptoms:**
|
|
|
|
- `pnpm run build` runs for 10+ minutes without completing
|
|
- `pnpm run lint` hangs indefinitely
|
|
- `pnpm run type-check` shows thousands of duplicate task entries
|
|
- CI/CD pipelines timeout after 10+ minutes
|
|
|
|
**Root Cause:**
|
|
|
|
Parent workspace packages (e.g., `apps/zitare/package.json`, `apps/presi/package.json`) have scripts that call `turbo run <task>`, creating an **infinite recursion loop**.
|
|
|
|
### How It Happens
|
|
|
|
```
|
|
Root turbo → finds "build" script in apps/zitare/package.json
|
|
→ runs "turbo run build" in zitare
|
|
→ finds "build" script again
|
|
→ runs "turbo run build" again
|
|
→ (infinite loop!)
|
|
```
|
|
|
|
### ❌ WRONG - Causes Infinite Recursion
|
|
|
|
```json
|
|
// apps/zitare/package.json - DON'T DO THIS!
|
|
{
|
|
"scripts": {
|
|
"build": "turbo run build", // ❌ WRONG
|
|
"lint": "turbo run lint", // ❌ WRONG
|
|
"type-check": "turbo run type-check", // ❌ WRONG
|
|
"clean": "turbo run clean" // ❌ WRONG
|
|
}
|
|
}
|
|
```
|
|
|
|
```json
|
|
// apps/picture/package.json - DON'T DO THIS!
|
|
{
|
|
"scripts": {
|
|
"build": "pnpm run --recursive build", // ❌ WRONG
|
|
"lint": "pnpm --filter '@picture/*' run lint" // ❌ WRONG
|
|
}
|
|
}
|
|
```
|
|
|
|
### ✅ CORRECT - Let Root Turbo Handle Orchestration
|
|
|
|
```json
|
|
// apps/zitare/package.json - CORRECT
|
|
{
|
|
"scripts": {
|
|
"dev": "turbo run dev", // ✅ OK for dev (persistent task, scoped)
|
|
// No build, lint, type-check scripts - handled by root turbo
|
|
"db:push": "pnpm --filter @zitare/backend db:push", // ✅ OK
|
|
"db:studio": "pnpm --filter @zitare/backend db:studio" // ✅ OK
|
|
}
|
|
}
|
|
```
|
|
|
|
### Why `dev` is the Exception
|
|
|
|
Using `turbo run dev` in parent packages is acceptable because:
|
|
|
|
1. It's typically run directly on that package (scoped: `pnpm zitare:dev`)
|
|
2. Dev tasks are persistent (long-running) and turbo handles them differently
|
|
3. Root never orchestrates `dev` across all packages simultaneously
|
|
|
|
### The Rule
|
|
|
|
> **Parent workspace packages must NEVER have scripts that call `turbo run <task>` for tasks that turbo orchestrates from the root.**
|
|
|
|
Tasks orchestrated from root (defined in `turbo.json`):
|
|
|
|
- ✅ `build` - Root handles this
|
|
- ✅ `lint` - Root handles this
|
|
- ✅ `type-check` - Root handles this
|
|
- ✅ `test` - Root handles this
|
|
- ✅ `clean` - Root handles this
|
|
- ❌ `dev` - Exception (scoped usage is fine)
|
|
|
|
### How to Fix
|
|
|
|
**If you added a recursive script:**
|
|
|
|
1. Open the parent package.json (e.g., `apps/myapp/package.json`)
|
|
2. Remove the problematic script entirely:
|
|
|
|
```diff
|
|
{
|
|
"scripts": {
|
|
"dev": "turbo run dev",
|
|
- "build": "turbo run build",
|
|
- "lint": "turbo run lint",
|
|
- "type-check": "turbo run type-check",
|
|
"db:push": "pnpm --filter @myapp/backend db:push"
|
|
}
|
|
}
|
|
```
|
|
|
|
3. The root `turbo.json` already handles orchestration for these tasks
|
|
|
|
### Affected Locations
|
|
|
|
Parent packages are located at:
|
|
|
|
- `apps/*/package.json` (e.g., `apps/zitare/package.json`)
|
|
- `games/*/package.json` (e.g., `games/mana-games/package.json`)
|
|
|
|
**Do NOT add turbo scripts here!**
|
|
|
|
Child packages (these are fine):
|
|
|
|
- `apps/*/apps/*/package.json` (e.g., `apps/zitare/apps/backend/package.json`)
|
|
- `packages/*/package.json` (e.g., `packages/shared-theme/package.json`)
|
|
|
|
---
|
|
|
|
## Build Issues
|
|
|
|
### Build Fails with "ELIFECYCLE Command failed"
|
|
|
|
**Check for:**
|
|
|
|
1. **Recursive turbo calls** (see above)
|
|
2. **Missing dependencies** in a package
|
|
3. **TypeScript errors** in source code
|
|
4. **Import/export mismatches**
|
|
|
|
**Debugging:**
|
|
|
|
```bash
|
|
# Run build and capture full output
|
|
pnpm run build 2>&1 | tee build.log
|
|
|
|
# Search for actual error (not just ELIFECYCLE)
|
|
grep -A10 "error during build" build.log
|
|
|
|
# Build specific package to isolate issue
|
|
pnpm --filter @zitare/backend build
|
|
```
|
|
|
|
### Build Times Out in CI
|
|
|
|
**Symptoms:**
|
|
|
|
- CI runs for 10+ minutes
|
|
- Timeout before completion
|
|
- "No output has been received in the last 10m0s"
|
|
|
|
**Solution:**
|
|
|
|
This is almost always caused by **recursive turbo calls**. See the [Recursive Turbo Calls](#recursive-turbo-calls) section above.
|
|
|
|
**Quick fix:**
|
|
|
|
```bash
|
|
# Locally, check if build completes in reasonable time
|
|
time pnpm run build
|
|
|
|
# Should complete in < 2 minutes for clean build
|
|
# Should complete in < 30 seconds for cached build
|
|
```
|
|
|
|
If it takes longer than 2-3 minutes, you have recursive scripts.
|
|
|
|
---
|
|
|
|
## Linting Issues
|
|
|
|
### Lint Hangs or Runs Forever
|
|
|
|
**Same issue as build** - recursive turbo calls!
|
|
|
|
**❌ WRONG:**
|
|
|
|
```json
|
|
// apps/presi/package.json - DON'T DO THIS!
|
|
{
|
|
"scripts": {
|
|
"lint": "pnpm --filter '@presi/*' run lint" // ❌ Recursive
|
|
}
|
|
}
|
|
```
|
|
|
|
**✅ CORRECT:**
|
|
|
|
```json
|
|
// apps/presi/package.json - Remove the lint script
|
|
{
|
|
"scripts": {
|
|
"dev": "pnpm --filter '@presi/*' run dev"
|
|
// No lint script - root turbo handles it
|
|
}
|
|
}
|
|
```
|
|
|
|
**Run lint from root:**
|
|
|
|
```bash
|
|
# Lint all packages
|
|
pnpm run lint
|
|
|
|
# Lint specific package
|
|
pnpm --filter @presi/backend lint
|
|
|
|
# Lint specific project
|
|
pnpm turbo run lint --filter=presi
|
|
```
|
|
|
|
### ESLint Errors
|
|
|
|
**Common issues:**
|
|
|
|
1. **Missing eslint config**
|
|
|
|
```bash
|
|
# Add shared config
|
|
pnpm add -D @manacore/eslint-config --filter @myapp/backend
|
|
```
|
|
|
|
2. **Incompatible ESLint versions**
|
|
|
|
```bash
|
|
# Check versions
|
|
pnpm ls eslint
|
|
|
|
# Update to match root version
|
|
pnpm add -D eslint@latest --filter @myapp/backend
|
|
```
|
|
|
|
---
|
|
|
|
## Prevention Checklist
|
|
|
|
When creating a new app or package:
|
|
|
|
- [ ] **DO NOT** add `build`, `lint`, `type-check`, or `test` scripts to parent packages
|
|
- [ ] **DO** add these scripts to child packages (apps/myapp/apps/backend/package.json)
|
|
- [ ] **DO** use project-specific scripts (e.g., `db:push`, `db:studio`)
|
|
- [ ] **DO** test locally: `pnpm run build` should complete in < 2 minutes
|
|
- [ ] **DO** refer to `CLAUDE.md` for patterns
|
|
|
|
### Quick Validation
|
|
|
|
```bash
|
|
# Check for problematic patterns in parent packages
|
|
for pkg in apps/*/package.json games/*/package.json; do
|
|
if grep -q '"build".*turbo run build' "$pkg" 2>/dev/null; then
|
|
echo "❌ RECURSIVE SCRIPT FOUND: $pkg"
|
|
fi
|
|
done
|
|
```
|
|
|
|
---
|
|
|
|
## NestJS Dependency Injection
|
|
|
|
### Problem: "Nest can't resolve dependencies" Error
|
|
|
|
**Symptoms:**
|
|
|
|
- NestJS fails to start with error: `Nest can't resolve dependencies of the XService (?)`
|
|
- Error mentions "argument Function at index [0] is available"
|
|
- The module imports look correct but service still won't inject
|
|
|
|
**Root Cause:**
|
|
|
|
Using **type-only imports** (`import {X }`) for classes that need to be injected. TypeScript erases type-only imports at compile time, so the actual class is not available at runtime for dependency injection.
|
|
|
|
### ❌ WRONG - Type-Only Import
|
|
|
|
```typescript
|
|
// services/mana-core-auth/src/ai/ai.service.ts - DON'T DO THIS!
|
|
import { Injectable } from '@nestjs/common';
|
|
import { ConfigService } from '@nestjs/config'; // ❌ Type-only import
|
|
|
|
@Injectable()
|
|
export class AiService {
|
|
constructor(private configService: ConfigService) {
|
|
// NestJS can't inject ConfigService because it was type-only imported!
|
|
}
|
|
}
|
|
```
|
|
|
|
**What happens:**
|
|
|
|
1. TypeScript compiles the code
|
|
2. The `type` keyword tells TypeScript to erase the import at compile time
|
|
3. The compiled JS has NO import for ConfigService
|
|
4. At runtime, NestJS can't find the ConfigService class to inject
|
|
5. Error: "Nest can't resolve dependencies of the AiService (?)"
|
|
|
|
### ✅ CORRECT - Regular Import
|
|
|
|
```typescript
|
|
// services/mana-core-auth/src/ai/ai.service.ts - CORRECT
|
|
import { Injectable } from '@nestjs/common';
|
|
import { ConfigService } from '@nestjs/config'; // ✅ Regular import
|
|
|
|
@Injectable()
|
|
export class AiService {
|
|
constructor(private configService: ConfigService) {
|
|
// ConfigService is properly imported and can be injected
|
|
}
|
|
}
|
|
```
|
|
|
|
### The Rule
|
|
|
|
> **For NestJS dependency injection, NEVER use type-only imports (`import {X }`) for classes you need to inject.**
|
|
|
|
- ✅ `import { ConfigService }` - Regular import (works)
|
|
- ❌ `import {ConfigService }` - Type-only import (breaks DI)
|
|
- ✅ `import type { MyInterface }` - Type-only for interfaces (fine, not injected)
|
|
- ✅ `import {MyType, MyClass }` - Mixed (MyType erased, MyClass available)
|
|
|
|
### How to Fix
|
|
|
|
1. Find the service with the DI error
|
|
2. Check all imports for classes used in the constructor
|
|
3. Remove the `type` keyword from class imports:
|
|
|
|
```diff
|
|
import { Injectable } from '@nestjs/common';
|
|
- import {ConfigService } from '@nestjs/config';
|
|
+ import { ConfigService } from '@nestjs/config';
|
|
|
|
@Injectable()
|
|
export class AiService {
|
|
constructor(private configService: ConfigService) {}
|
|
}
|
|
```
|
|
|
|
4. Rebuild and test:
|
|
|
|
```bash
|
|
pnpm --filter mana-core-auth build
|
|
pnpm --filter mana-core-auth start:dev
|
|
```
|
|
|
|
### Debugging
|
|
|
|
If you're still getting DI errors after removing type-only imports:
|
|
|
|
1. **Check the module imports the provider's dependencies:**
|
|
|
|
```typescript
|
|
@Module({
|
|
imports: [ConfigModule], // ← ConfigService needs ConfigModule
|
|
providers: [AiService],
|
|
exports: [AiService],
|
|
})
|
|
export class AiModule {}
|
|
```
|
|
|
|
2. **Verify the compiled JavaScript:**
|
|
|
|
```bash
|
|
# Build the service
|
|
pnpm --filter mana-core-auth build
|
|
|
|
# Check the compiled output
|
|
cat services/mana-core-auth/dist/ai/ai.service.js | grep "require"
|
|
|
|
# Should see:
|
|
# const config_1 = require("@nestjs/config"); ✅ Good
|
|
# NOT:
|
|
# const config_1 = undefined; ❌ Bad (type-only import)
|
|
```
|
|
|
|
3. **Check Docker builds:**
|
|
|
|
If the error only happens in Docker but not locally:
|
|
|
|
```bash
|
|
# Build Docker image without cache
|
|
docker build --no-cache -f services/mana-core-auth/Dockerfile -t test .
|
|
|
|
# Check the compiled code in the image
|
|
docker run --rm --entrypoint cat test /app/dist/ai/ai.service.js
|
|
```
|
|
|
|
### Related Issues
|
|
|
|
- [Commit d69cc607](https://github.com/Memo-2023/manacore-monorepo/commit/d69cc607) - Fixed type-only ConfigService import in AiService
|
|
- TypeScript `import type` vs `import {}` - both erase at compile time
|
|
- Docker layer caching can hide fixes if source wasn't properly copied
|
|
|
|
---
|
|
|
|
## Staging Deployment Issues
|
|
|
|
### Overview
|
|
|
|
This section documents the complete troubleshooting journey for deploying mana-core-auth + chat (backend + web) to staging. It covers GitHub Actions CI/CD simplification, Docker health checks, database setup, and SvelteKit environment variables.
|
|
|
|
### Problem 1: GitHub Running Disabled Workflows
|
|
|
|
**Symptoms:**
|
|
|
|
- Workflows with `.full.yml` extension were still running
|
|
- `test.full.yml` was being recognized as a valid workflow
|
|
- Multiple unnecessary workflows running on every push
|
|
|
|
**What We Tried:**
|
|
|
|
1. ❌ Renaming to `.disabled` extension → Still ran
|
|
2. ❌ Renaming to `.full.yml` extension → Still ran (GitHub recognizes any `.yml` in `.github/workflows/`)
|
|
|
|
**Solution:**
|
|
|
|
- ✅ Rename to `.yml.bak` extension (GitHub ignores non-`.yml` files)
|
|
|
|
```bash
|
|
# Disable a workflow
|
|
mv .github/workflows/test.yml .github/workflows/test.yml.bak
|
|
|
|
# Re-enable a workflow
|
|
mv .github/workflows/test.yml.bak .github/workflows/test.yml
|
|
```
|
|
|
|
**Files Changed:**
|
|
|
|
- `test.yml` → `test.yml.bak`
|
|
- `test-coverage.yml` → `test-coverage.yml.bak`
|
|
- `ci-pull-request.yml` → `ci-pull-request.yml.bak`
|
|
- `dependency-update.yml` → `dependency-update.yml.bak`
|
|
|
|
---
|
|
|
|
### Problem 2: chat-backend Container Unhealthy
|
|
|
|
**Symptoms:**
|
|
|
|
- Deployment failed with: `dependency failed to start: container chat-backend-staging is unhealthy`
|
|
- chat-web wouldn't start because it depends on chat-backend being healthy
|
|
|
|
**Debugging Steps:**
|
|
|
|
```bash
|
|
# Connect to staging server
|
|
ssh -i ~/.ssh/hetzner_deploy_key deploy@46.224.108.214
|
|
|
|
# Check container status
|
|
cd ~/manacore-staging
|
|
docker compose ps
|
|
|
|
# Check logs for the failing container
|
|
docker compose logs chat-backend --tail=100
|
|
|
|
# Test health endpoint manually from inside container
|
|
docker compose exec chat-backend wget -q -O - http://localhost:3002/api/v1/health
|
|
```
|
|
|
|
**Root Cause 1: Missing Database**
|
|
|
|
The logs showed:
|
|
|
|
```
|
|
error: database "chat" does not exist
|
|
```
|
|
|
|
**Fix:** Create the database manually:
|
|
|
|
```bash
|
|
docker compose exec -T postgres psql -U postgres -c "CREATE DATABASE chat;"
|
|
```
|
|
|
|
**Root Cause 2: Wrong Health Check Path**
|
|
|
|
The `docker-compose.staging.yml` had:
|
|
|
|
```yaml
|
|
healthcheck:
|
|
test: ['CMD', 'wget', '...', 'http://localhost:3002/api/health'] # ❌ WRONG
|
|
```
|
|
|
|
But NestJS health endpoint is at `/api/v1/health`:
|
|
|
|
```yaml
|
|
healthcheck:
|
|
test: ['CMD', 'wget', '...', 'http://localhost:3002/api/v1/health'] # ✅ CORRECT
|
|
```
|
|
|
|
**How to Verify Health Endpoints:**
|
|
|
|
| Service | Port | Health Endpoint |
|
|
| -------------- | ---- | ---------------- |
|
|
| mana-core-auth | 3001 | `/api/v1/health` |
|
|
| chat-backend | 3002 | `/api/v1/health` |
|
|
| chat-web | 3000 | `/health` |
|
|
|
|
```bash
|
|
# Test from outside the server
|
|
curl http://46.224.108.214:3001/api/v1/health
|
|
curl http://46.224.108.214:3002/api/v1/health
|
|
curl http://46.224.108.214:3000/health
|
|
```
|
|
|
|
---
|
|
|
|
### Problem 3: SvelteKit Static Environment Variable Imports
|
|
|
|
**Symptoms:**
|
|
|
|
- Docker build failed with: `PUBLIC_MANA_CORE_AUTH_URL is not exported by $env/static/public`
|
|
- Build error during `npm run build` in Docker
|
|
|
|
**Root Cause:**
|
|
|
|
SvelteKit's `$env/static/public` imports are resolved at **build time**, not runtime. When building in Docker, these environment variables don't exist.
|
|
|
|
**❌ WRONG - Static Import (Build Time):**
|
|
|
|
```typescript
|
|
// apps/chat/apps/web/src/lib/stores/auth.svelte.ts
|
|
import { PUBLIC_MANA_CORE_AUTH_URL } from '$env/static/public'; // ❌ Fails in Docker
|
|
|
|
const authUrl = PUBLIC_MANA_CORE_AUTH_URL;
|
|
```
|
|
|
|
**✅ CORRECT - Runtime Environment Variable:**
|
|
|
|
```typescript
|
|
// apps/chat/apps/web/src/lib/stores/auth.svelte.ts
|
|
import { browser } from '$app/environment';
|
|
|
|
function getAuthUrl(): string {
|
|
if (browser && typeof window !== 'undefined') {
|
|
// Client-side: check for injected env or use default
|
|
return (
|
|
(window as unknown as { __PUBLIC_MANA_CORE_AUTH_URL__?: string })
|
|
.__PUBLIC_MANA_CORE_AUTH_URL__ ||
|
|
import.meta.env.PUBLIC_MANA_CORE_AUTH_URL ||
|
|
'http://localhost:3001'
|
|
);
|
|
}
|
|
// Server-side: use process.env or default
|
|
return process.env.PUBLIC_MANA_CORE_AUTH_URL || 'http://localhost:3001';
|
|
}
|
|
```
|
|
|
|
**The Pattern:**
|
|
|
|
1. Check if running in browser
|
|
2. Try window-injected variable (for runtime injection)
|
|
3. Try `import.meta.env` (for Vite build-time)
|
|
4. Fall back to `process.env` (for SSR)
|
|
5. Use localhost default for development
|
|
|
|
**Files Fixed:**
|
|
|
|
- `apps/chat/apps/web/src/lib/stores/auth.svelte.ts`
|
|
- `apps/chat/apps/web/src/lib/services/feedback.ts`
|
|
|
|
---
|
|
|
|
### Problem 4: Orphan Docker Containers
|
|
|
|
**Symptoms:**
|
|
|
|
- Old containers from previous deployments still running
|
|
- `docker compose ps` shows unexpected services
|
|
|
|
**Fix:**
|
|
|
|
```bash
|
|
# Remove orphan containers
|
|
docker compose down --remove-orphans
|
|
|
|
# Bring up fresh
|
|
docker compose up -d
|
|
|
|
# Manually remove specific orphans
|
|
docker rm -f manadeck-backend-staging manacore-nginx-staging
|
|
```
|
|
|
|
---
|
|
|
|
### Problem 5: Client-Side Calling localhost Instead of Public IP
|
|
|
|
**Symptoms:**
|
|
|
|
- Browser console shows: `POST http://localhost:3001/api/v1/auth/register net::ERR_CONNECTION_REFUSED`
|
|
- API calls work from server but fail from browser
|
|
- The injected `window.__PUBLIC_MANA_CORE_AUTH_URL__` is empty or undefined
|
|
|
|
**Root Cause:**
|
|
|
|
SvelteKit's environment variables work differently on server vs client:
|
|
|
|
- **Server-side (SSR):** Has access to `process.env`
|
|
- **Client-side (browser):** Does NOT have access to `process.env` - needs explicit injection
|
|
|
|
The initial fix using `process.env` only worked for SSR. Browser code falls back to `localhost`.
|
|
|
|
**Solution - Runtime Environment Injection:**
|
|
|
|
1. **Add client URLs to docker-compose.staging.yml:**
|
|
|
|
```yaml
|
|
chat-web:
|
|
environment:
|
|
# Server-side URLs (Docker internal network)
|
|
PUBLIC_BACKEND_URL: http://chat-backend:3002
|
|
PUBLIC_MANA_CORE_AUTH_URL: http://mana-core-auth:3001
|
|
# Client-side URLs (browser access via public IP)
|
|
PUBLIC_BACKEND_URL_CLIENT: http://46.224.108.214:3002
|
|
PUBLIC_MANA_CORE_AUTH_URL_CLIENT: http://46.224.108.214:3001
|
|
```
|
|
|
|
2. **Inject into HTML via hooks.server.ts:**
|
|
|
|
```typescript
|
|
// apps/chat/apps/web/src/hooks.server.ts
|
|
import type { Handle } from '@sveltejs/kit';
|
|
|
|
const PUBLIC_MANA_CORE_AUTH_URL_CLIENT =
|
|
process.env.PUBLIC_MANA_CORE_AUTH_URL_CLIENT || process.env.PUBLIC_MANA_CORE_AUTH_URL || '';
|
|
|
|
export const handle: Handle = async ({ event, resolve }) => {
|
|
return resolve(event, {
|
|
transformPageChunk: ({ html }) => {
|
|
const envScript = `<script>
|
|
window.__PUBLIC_MANA_CORE_AUTH_URL__ = "${PUBLIC_MANA_CORE_AUTH_URL_CLIENT}";
|
|
</script>`;
|
|
return html.replace('<head>', `<head>${envScript}`);
|
|
},
|
|
});
|
|
};
|
|
```
|
|
|
|
3. **Read from window in client code:**
|
|
|
|
```typescript
|
|
// apps/chat/apps/web/src/lib/stores/auth.svelte.ts
|
|
function getAuthUrl(): string {
|
|
if (browser && typeof window !== 'undefined') {
|
|
const injectedUrl = (window as unknown as { __PUBLIC_MANA_CORE_AUTH_URL__?: string })
|
|
.__PUBLIC_MANA_CORE_AUTH_URL__;
|
|
return injectedUrl || 'http://localhost:3001';
|
|
}
|
|
return process.env.PUBLIC_MANA_CORE_AUTH_URL || 'http://localhost:3001';
|
|
}
|
|
```
|
|
|
|
**How to Verify:**
|
|
|
|
Open browser DevTools (F12) → Console:
|
|
|
|
```javascript
|
|
window.__PUBLIC_MANA_CORE_AUTH_URL__;
|
|
// Should show: "http://46.224.108.214:3001"
|
|
```
|
|
|
|
---
|
|
|
|
### Problem 6: CORS Blocking Cross-Origin Requests
|
|
|
|
**Symptoms:**
|
|
|
|
- Browser console shows: `Access to fetch at 'http://46.224.108.214:3001/...' from origin 'http://46.224.108.214:3000' has been blocked by CORS policy`
|
|
- API calls work via curl but fail from browser
|
|
- Preflight OPTIONS requests fail
|
|
|
|
**Root Cause:**
|
|
|
|
Browser security blocks requests between different origins (port counts as different origin):
|
|
|
|
- chat-web: `http://46.224.108.214:3000`
|
|
- mana-core-auth: `http://46.224.108.214:3001`
|
|
|
|
Even though they're on the same IP, different ports = different origins = CORS blocked.
|
|
|
|
**Solution:**
|
|
|
|
Add `CORS_ORIGINS` environment variable to mana-core-auth in docker-compose.staging.yml:
|
|
|
|
```yaml
|
|
mana-core-auth:
|
|
environment:
|
|
# ... other env vars ...
|
|
# CORS - Allow chat-web and other staging origins
|
|
CORS_ORIGINS: http://46.224.108.214:3000,http://46.224.108.214:3002,http://localhost:3000
|
|
```
|
|
|
|
**CORS Configuration in mana-core-auth:**
|
|
|
|
The service reads `CORS_ORIGINS` from environment:
|
|
|
|
```typescript
|
|
// services/mana-core-auth/src/config/configuration.ts
|
|
cors: {
|
|
origin: process.env.CORS_ORIGINS?.split(',') || [
|
|
'http://localhost:3000',
|
|
'http://localhost:8081',
|
|
],
|
|
}
|
|
|
|
// services/mana-core-auth/src/main.ts
|
|
const corsOrigins = configService.get<string[]>('cors.origin') || [];
|
|
app.enableCors({
|
|
origin: corsOrigins,
|
|
credentials: true,
|
|
});
|
|
```
|
|
|
|
**How to Verify:**
|
|
|
|
```bash
|
|
# Test CORS preflight
|
|
curl -X OPTIONS http://46.224.108.214:3001/api/v1/auth/register \
|
|
-H "Origin: http://46.224.108.214:3000" \
|
|
-H "Access-Control-Request-Method: POST" \
|
|
-v
|
|
|
|
# Should see in response headers:
|
|
# Access-Control-Allow-Origin: http://46.224.108.214:3000
|
|
```
|
|
|
|
---
|
|
|
|
### Problem 7: Missing Database Schema
|
|
|
|
**Symptoms:**
|
|
|
|
- API returns: `{"statusCode": 500, "message": "relation \"auth.users\" does not exist"}`
|
|
- Registration/login endpoints fail with 500 error
|
|
- Health check passes but auth endpoints fail
|
|
|
|
**Root Cause:**
|
|
|
|
The database exists but the schema hasn't been pushed. Drizzle ORM needs to run `db:push` to create:
|
|
|
|
- `auth` schema with tables: users, accounts, sessions, passwords, verification, etc.
|
|
- `credits` schema with tables: balances, transactions, packages, etc.
|
|
|
|
**Why It Happened:**
|
|
|
|
The CD workflow was calling `pnpm run db:migrate` but that script doesn't exist in the package.json. The correct script is `db:push` which runs `drizzle-kit push`.
|
|
|
|
**Solution:**
|
|
|
|
1. **Manual fix (immediate):**
|
|
|
|
```bash
|
|
ssh -i ~/.ssh/hetzner_deploy_key deploy@46.224.108.214
|
|
cd ~/manacore-staging
|
|
|
|
# Push schema to database (--force skips interactive confirmation)
|
|
docker compose exec -T mana-core-auth npx drizzle-kit push --force
|
|
```
|
|
|
|
2. **Fix CD workflow (permanent):**
|
|
|
|
```yaml
|
|
# .github/workflows/cd-staging.yml - BEFORE
|
|
docker compose exec -T mana-core-auth pnpm run db:migrate || echo "Auth migrations skipped"
|
|
|
|
# .github/workflows/cd-staging.yml - AFTER
|
|
docker compose exec -T mana-core-auth npx drizzle-kit push --force || echo "Auth schema push skipped"
|
|
```
|
|
|
|
**How to Verify:**
|
|
|
|
```bash
|
|
# Check if auth schema exists
|
|
docker compose exec -T postgres psql -U postgres -d manacore_auth -c '\dt auth.*'
|
|
|
|
# Should show 12 tables:
|
|
# auth | accounts, invitations, jwks, members, organizations,
|
|
# passwords, security_events, sessions, two_factor_auth,
|
|
# user_settings, users, verification
|
|
```
|
|
|
|
**Files Changed:**
|
|
|
|
- `.github/workflows/cd-staging.yml` - Line 253: `db:migrate` → `drizzle-kit push --force`
|
|
|
|
---
|
|
|
|
### Complete Staging Deployment Checklist
|
|
|
|
#### Before Deployment
|
|
|
|
- [ ] Verify `docker-compose.staging.yml` has correct health check paths
|
|
- [ ] Verify CI/CD workflow (`cd-staging.yml`) has matching health check paths
|
|
- [ ] Check that required databases exist or CI creates them
|
|
- [ ] Verify CD workflow runs `drizzle-kit push --force` to create schemas (not `db:migrate`)
|
|
- [ ] Verify `CORS_ORIGINS` includes all frontend origins
|
|
- [ ] Verify `PUBLIC_*_CLIENT` env vars have correct public IPs for browser access
|
|
|
|
#### During Deployment Failure
|
|
|
|
1. **SSH to server:**
|
|
|
|
```bash
|
|
ssh -i ~/.ssh/hetzner_deploy_key deploy@46.224.108.214
|
|
cd ~/manacore-staging
|
|
```
|
|
|
|
2. **Check container status:**
|
|
|
|
```bash
|
|
docker compose ps
|
|
```
|
|
|
|
3. **Check logs for failing container:**
|
|
|
|
```bash
|
|
docker compose logs <container-name> --tail=100
|
|
```
|
|
|
|
4. **Common fixes:**
|
|
|
|
```bash
|
|
# Create missing database
|
|
docker compose exec -T postgres psql -U postgres -c "CREATE DATABASE <dbname>;"
|
|
|
|
# Restart a service
|
|
docker compose restart <service-name>
|
|
|
|
# Force recreate
|
|
docker compose up -d --force-recreate <service-name>
|
|
```
|
|
|
|
5. **Verify health:**
|
|
```bash
|
|
curl http://localhost:3001/api/v1/health # mana-core-auth
|
|
curl http://localhost:3002/api/v1/health # chat-backend
|
|
curl http://localhost:3000/health # chat-web
|
|
```
|
|
|
|
#### After Deployment
|
|
|
|
- [ ] Verify all health endpoints respond
|
|
- [ ] Check container logs for errors
|
|
- [ ] Test actual functionality (login, API calls)
|
|
|
|
---
|
|
|
|
### Key Files for Staging Deployment
|
|
|
|
| File | Purpose |
|
|
| ---------------------------------- | ------------------------------------- |
|
|
| `docker-compose.staging.yml` | Service definitions and health checks |
|
|
| `.github/workflows/cd-staging.yml` | CI/CD deployment workflow |
|
|
| `.github/workflows/ci-main.yml` | Docker image builds on push to main |
|
|
|
|
### Health Check Patterns
|
|
|
|
**Docker Compose (`docker-compose.staging.yml`):**
|
|
|
|
```yaml
|
|
healthcheck:
|
|
test: ['CMD', 'wget', '--no-verbose', '--tries=1', '--spider', 'http://localhost:PORT/ENDPOINT']
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
start_period: 40s
|
|
```
|
|
|
|
**CI/CD Workflow (`cd-staging.yml`):**
|
|
|
|
```bash
|
|
# Check from inside container
|
|
docker compose exec -T chat-backend wget -q -O - http://localhost:3002/api/v1/health
|
|
```
|
|
|
|
### Lessons Learned
|
|
|
|
1. **GitHub Workflows:** Only files ending in `.yml` or `.yaml` in `.github/workflows/` are recognized. Use `.bak` extension to disable.
|
|
|
|
2. **NestJS Health Endpoints:** All NestJS backends use `/api/v1/health`, not `/api/health`.
|
|
|
|
3. **Docker Compose Dependencies:** When using `depends_on: condition: service_healthy`, the dependent service won't start until the health check passes.
|
|
|
|
4. **Database Creation:** Must happen AFTER PostgreSQL is healthy but BEFORE dependent services run migrations.
|
|
|
|
5. **SvelteKit Environment Variables:** Use runtime patterns (`process.env`, `import.meta.env`) instead of `$env/static/public` for Docker builds.
|
|
|
|
6. **Verify Before Commit:** Always check both `docker-compose.staging.yml` AND CI/CD workflows for matching paths.
|
|
|
|
7. **Server vs Client URLs:** Docker internal URLs (e.g., `http://mana-core-auth:3001`) only work server-side. Browsers need public IPs. Use separate `_CLIENT` env vars for browser access.
|
|
|
|
8. **SvelteKit Runtime Injection:** Use `hooks.server.ts` with `transformPageChunk` to inject environment variables into HTML at runtime. This is the only reliable way to pass server env vars to client code.
|
|
|
|
9. **CORS for Multi-Service Apps:** When frontend and backend are on different ports, configure CORS on the backend. Port differences count as different origins (e.g., `:3000` vs `:3001`).
|
|
|
|
10. **Environment Variable Flow:**
|
|
|
|
```
|
|
docker-compose.yml → Container env → process.env (SSR) → hooks.server.ts → window.__VAR__ (browser)
|
|
```
|
|
|
|
11. **Database Schema vs Database:** Creating a database (`CREATE DATABASE`) is not enough - Drizzle needs `db:push` to create schemas and tables. Health checks may pass with empty database, but API calls will fail with "relation does not exist".
|
|
|
|
12. **Drizzle Kit Interactive Mode:** `drizzle-kit push` prompts for confirmation. Use `--force` flag in CI/CD to skip interactive mode.
|
|
|
|
13. **pnpm Symlinks in Docker:** pnpm uses symlinks to a central `.pnpm` store. When copying `node_modules` in Docker, you must preserve both the symlinks AND the target directory they point to. See [Problem 8](#problem-8-pnpm-symlinks-broken-in-docker-container).
|
|
|
|
---
|
|
|
|
### Problem 8: pnpm Symlinks Broken in Docker Container
|
|
|
|
**Symptoms:**
|
|
|
|
- Container starts but crashes with: `Cannot find package 'date-fns' imported from /app/build/server/chunks/_page.svelte-xxx.js`
|
|
- Error `ERR_MODULE_NOT_FOUND` for packages that ARE in node_modules
|
|
- Works locally but fails in Docker production stage
|
|
- `ls node_modules/date-fns` shows a symlink pointing to `../../../../../node_modules/.pnpm/...`
|
|
|
|
**Root Cause:**
|
|
|
|
pnpm uses symlinks to a central `.pnpm` store at the monorepo root. When you copy only the app's `node_modules` in Docker, the symlinks point to paths that don't exist:
|
|
|
|
```
|
|
# In builder stage (pnpm workspace):
|
|
/app/apps/todo/apps/web/node_modules/date-fns → ../../../../../node_modules/.pnpm/date-fns@4.1.0/node_modules/date-fns
|
|
|
|
# In production stage (old broken approach):
|
|
/app/node_modules/date-fns → ../../../../../node_modules/.pnpm/...
|
|
# ↑ BROKEN! This path doesn't exist because we only copied to /app/
|
|
```
|
|
|
|
**❌ WRONG - Flattening Directory Structure:**
|
|
|
|
```dockerfile
|
|
# Production stage
|
|
FROM node:20-alpine AS production
|
|
|
|
WORKDIR /app # ❌ Different from builder structure
|
|
|
|
# Copy node_modules (symlinks will be broken!)
|
|
COPY --from=builder /app/apps/todo/apps/web/node_modules ./node_modules # ❌ BROKEN
|
|
COPY --from=builder /app/apps/todo/apps/web/build ./build
|
|
COPY --from=builder /app/apps/todo/apps/web/package.json ./
|
|
|
|
CMD ["node", "build"]
|
|
```
|
|
|
|
The symlinks in `node_modules` point to `../../../../../node_modules/.pnpm/...` which resolves to a non-existent path from `/app/`.
|
|
|
|
**✅ CORRECT - Preserve Directory Structure + Copy .pnpm Store:**
|
|
|
|
```dockerfile
|
|
# Production stage
|
|
FROM node:20-alpine AS production
|
|
|
|
# Keep same directory structure as builder so pnpm symlinks resolve correctly
|
|
WORKDIR /app/apps/todo/apps/web # ✅ Same as builder
|
|
|
|
# Copy the pnpm store that symlinks point to (at /app/node_modules/.pnpm)
|
|
COPY --from=builder /app/node_modules/.pnpm /app/node_modules/.pnpm # ✅ Target of symlinks
|
|
|
|
# Copy the app's node_modules (contains symlinks to the pnpm store)
|
|
COPY --from=builder /app/apps/todo/apps/web/node_modules ./node_modules # ✅ Symlinks work now
|
|
|
|
# Copy built application
|
|
COPY --from=builder /app/apps/todo/apps/web/build ./build
|
|
COPY --from=builder /app/apps/todo/apps/web/package.json ./
|
|
|
|
CMD ["node", "build"]
|
|
```
|
|
|
|
**Why This Works:**
|
|
|
|
1. `WORKDIR /app/apps/todo/apps/web` - Production container has same path as builder
|
|
2. Symlinks in `./node_modules/` point to `../../../../../node_modules/.pnpm/...`
|
|
3. From `/app/apps/todo/apps/web/node_modules/`, going up 5 directories reaches `/app/`
|
|
4. `/app/node_modules/.pnpm/` exists because we copied it!
|
|
|
|
**How to Debug:**
|
|
|
|
```bash
|
|
# Check if symlinks are broken in the container
|
|
docker exec <container> ls -la node_modules/date-fns
|
|
# Shows: date-fns -> ../../../../../node_modules/.pnpm/date-fns@4.1.0/node_modules/date-fns
|
|
|
|
# Check if the target exists
|
|
docker exec <container> ls -la /app/node_modules/.pnpm/date-fns@4.1.0/
|
|
# If "No such file or directory" → symlink is broken
|
|
|
|
# Check the image size (should be ~1GB with .pnpm store, ~50MB without)
|
|
docker images | grep todo-web
|
|
```
|
|
|
|
**Trade-offs:**
|
|
|
|
| Approach | Image Size | Symlinks | Works |
|
|
| ------------------------------------- | ---------- | -------- | ----- |
|
|
| Copy only app's node_modules | ~50MB | Broken | ❌ |
|
|
| Copy app's node_modules + .pnpm store | ~1GB | Working | ✅ |
|
|
|
|
The larger image size is the cost of pnpm's deduplication strategy. In a monorepo, this is actually more efficient than copying all dependencies flat.
|
|
|
|
**Alternative: Use npm Instead of pnpm in Docker:**
|
|
|
|
If image size is critical, you could use npm in the Docker build:
|
|
|
|
```dockerfile
|
|
# Alternative approach (not recommended for monorepos)
|
|
FROM node:20-alpine AS production
|
|
WORKDIR /app
|
|
COPY --from=builder /app/apps/todo/apps/web/build ./build
|
|
COPY --from=builder /app/apps/todo/apps/web/package.json ./
|
|
|
|
# Clean install with npm (flattens dependencies)
|
|
RUN npm install --omit=dev
|
|
|
|
CMD ["node", "build"]
|
|
```
|
|
|
|
⚠️ **Warning:** This may fail with `workspace:*` protocol in package.json dependencies. Only works if all dependencies are published to npm.
|
|
|
|
**Affected Files:**
|
|
|
|
- `apps/todo/apps/web/Dockerfile`
|
|
- `apps/manacore/apps/web/Dockerfile`
|
|
- `apps/chat/apps/web/Dockerfile`
|
|
- `apps/calendar/apps/web/Dockerfile`
|
|
- `apps/clock/apps/web/Dockerfile`
|
|
|
|
**Related Commits:**
|
|
|
|
- `fd1c0ee6` - fix(docker): preserve pnpm symlink structure in web Dockerfiles
|
|
|
|
---
|
|
|
|
### Problem 9: Hardcoded localhost URLs in SvelteKit Web Apps
|
|
|
|
**Symptoms:**
|
|
|
|
- Browser console shows: `POST http://localhost:3001/api/v1/auth/login net::ERR_CONNECTION_REFUSED`
|
|
- App works locally but auth fails on staging/production
|
|
- The `window.__PUBLIC_MANA_CORE_AUTH_URL__` may be set correctly, but code doesn't use it
|
|
- Looking at the source code reveals hardcoded URLs like `const MANA_AUTH_URL = 'http://localhost:3001'`
|
|
|
|
**Root Cause:**
|
|
|
|
Developers hardcode `localhost:3001` directly in TypeScript files instead of using the runtime injection pattern. This works locally but breaks in Docker deployments.
|
|
|
|
**Common Locations of Hardcoded URLs:**
|
|
|
|
```typescript
|
|
// ❌ These patterns are WRONG:
|
|
|
|
// In auth.svelte.ts
|
|
const MANA_AUTH_URL = 'http://localhost:3001';
|
|
|
|
// In user-settings.svelte.ts
|
|
const MANA_AUTH_URL = 'http://localhost:3001';
|
|
|
|
// In feedback.ts or feedback page
|
|
apiUrl: 'http://localhost:3001',
|
|
|
|
// Using build-time env vars (also wrong for Docker)
|
|
const MANA_AUTH_URL = import.meta.env.PUBLIC_MANA_CORE_AUTH_URL || 'http://localhost:3001';
|
|
```
|
|
|
|
**Solution:**
|
|
|
|
1. **Create `hooks.server.ts`** if it doesn't exist (see Problem 5)
|
|
2. **Use `getAuthUrl()` pattern in all files:**
|
|
|
|
```typescript
|
|
// ✅ CORRECT pattern
|
|
import { browser } from '$app/environment';
|
|
|
|
function getAuthUrl(): string {
|
|
if (browser && typeof window !== 'undefined') {
|
|
const injectedUrl = (window as unknown as { __PUBLIC_MANA_CORE_AUTH_URL__?: string })
|
|
.__PUBLIC_MANA_CORE_AUTH_URL__;
|
|
return injectedUrl || 'http://localhost:3001';
|
|
}
|
|
return process.env.PUBLIC_MANA_CORE_AUTH_URL || 'http://localhost:3001';
|
|
}
|
|
|
|
// Use getAuthUrl() instead of hardcoded string
|
|
const auth = initializeWebAuth({ baseUrl: getAuthUrl() });
|
|
```
|
|
|
|
**How to Find Hardcoded URLs:**
|
|
|
|
```bash
|
|
# Search for hardcoded localhost:3001 in web apps
|
|
grep -r "localhost:3001" apps/*/apps/web/src --include="*.ts" --include="*.svelte"
|
|
|
|
# Check for the correct pattern (window injection)
|
|
grep -r "__PUBLIC_MANA_CORE_AUTH_URL__" apps/*/apps/web/src
|
|
```
|
|
|
|
**Apps Status (as of 2024-12):**
|
|
|
|
| App | Status | Files to Fix |
|
|
| ------------------- | ------------ | ---------------------------------------------------------------- |
|
|
| `chat/apps/web` | ✅ Fixed | - |
|
|
| `todo/apps/web` | ✅ Fixed | - |
|
|
| `calendar/apps/web` | ✅ Fixed | - |
|
|
| `clock/apps/web` | ✅ Fixed | - |
|
|
| `contacts/apps/web` | ❌ Needs Fix | auth.svelte.ts, user-settings.svelte.ts, feedback.ts |
|
|
| `manadeck/apps/web` | ❌ Needs Fix | user-settings.svelte.ts, feedback.ts |
|
|
| `manacore/apps/web` | ❌ Needs Fix | auth.svelte.ts, user-settings.svelte.ts, feedback.ts, credits.ts |
|
|
| `zitare/apps/web` | ❌ Needs Fix | auth.svelte.ts, user-settings.svelte.ts, feedback.ts |
|
|
| `picture/apps/web` | ❌ Needs Fix | user-settings.svelte.ts |
|
|
|
|
**Complete Fix Checklist for Each App:**
|
|
|
|
- [ ] Create/update `src/hooks.server.ts` with env injection
|
|
- [ ] Update `src/lib/stores/auth.svelte.ts` → use `getAuthUrl()` pattern
|
|
- [ ] Update `src/lib/stores/user-settings.svelte.ts` → use `getAuthUrl()` pattern
|
|
- [ ] Update any `feedback.ts` or feedback services → use `getAuthUrl()` pattern
|
|
- [ ] Update any other files with hardcoded URLs
|
|
- [ ] Add `PUBLIC_MANA_CORE_AUTH_URL_CLIENT` to `docker-compose.staging.yml`
|
|
- [ ] Test locally with `pnpm dev`
|
|
- [ ] Deploy and test on staging
|
|
|
|
**See Also:**
|
|
|
|
- [Problem 5: Client-Side Calling localhost Instead of Public IP](#problem-5-client-side-calling-localhost-instead-of-public-ip)
|
|
- [SvelteKit Web Guidelines - Environment Variables](/.claude/guidelines/sveltekit-web.md#environment-variables)
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [CLAUDE.md - Turborepo Configuration](./CLAUDE.md#turborepo-configuration)
|
|
- [turbo.json](./turbo.json) - Root task orchestration
|
|
- [Turborepo Docs](https://turbo.build/repo/docs)
|
|
|
|
## Getting Help
|
|
|
|
If you encounter an issue not covered here:
|
|
|
|
1. Check the [GitHub Issues](https://github.com/Memo-2023/manacore-monorepo/issues)
|
|
2. Review recent commits that may have introduced the issue
|
|
3. Run `pnpm clean` and `pnpm install` to reset
|
|
4. Create a new issue with full error logs
|