mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 20:01:09 +02:00
fix(mac-mini): make startup.sh idempotent and non-destructive
The previous startup.sh checked colima status via `colima status | grep running` and, if that failed, ran `colima stop --force` unconditionally before starting. This is destructive: a transient status mis-detection can kill a healthy running VM, and the subsequent start often hangs because of leftover locks/processes. Triggered today during the ManaCore→Mana rename: reloading the docker-startup LaunchAgent ran the script, which falsely concluded colima was down, killed the running VM, and left 12 zombie limactl processes plus a stale disk lock symlink. The whole production stack (incl. Forgejo) was offline until manual cleanup. Changes: - Use `docker info` as the readiness check instead of `colima status` — it directly tests the thing we care about (docker socket reachable) - Only do cleanup work when we actually need to start; never SIGKILL a running VM as a "precaution" - When we do need to start: reap any zombie limactl/colima processes from prior failed runs, and clear the stale disk-in-use lock if no process actually holds it - Verify successful start with `docker info`, not `colima status` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
a9529bcf1b
commit
af9b1f9369
1 changed files with 29 additions and 9 deletions
|
|
@ -54,15 +54,34 @@ if [ ! -d "/Volumes/ManaData" ]; then
|
|||
fi
|
||||
|
||||
# ─── Start Colima ───
|
||||
if colima status 2>/dev/null | grep -q "running"; then
|
||||
log "Colima already running"
|
||||
# Use `docker info` as the source of truth for "is the runtime usable" instead
|
||||
# of `colima status`, which can mis-report and trigger a destructive restart.
|
||||
if docker info >/dev/null 2>&1; then
|
||||
log "Colima already running (docker reachable)"
|
||||
else
|
||||
log "Colima not reachable, preparing fresh start..."
|
||||
|
||||
# Reap zombie colima/limactl processes from previously failed starts.
|
||||
# These hold locks that prevent a clean start. Do NOT touch a running VM.
|
||||
for pat in "colima stop" "limactl stop" "colima daemon" "limactl hostagent" "limactl usernet"; do
|
||||
pids=$(pgrep -f "$pat" || true)
|
||||
if [ -n "$pids" ]; then
|
||||
log " reaping stale: $pat ($pids)"
|
||||
echo "$pids" | xargs kill -9 2>/dev/null || true
|
||||
fi
|
||||
done
|
||||
sleep 1
|
||||
|
||||
# Clear stale disk lock if no process actually owns it.
|
||||
# The lock is a symlink at /Volumes/ManaData/colima-disk/in_use_by → ~/.colima/_lima/colima
|
||||
# If the symlink exists but no limactl/vz process is running, the lock is stale.
|
||||
LOCK="/Volumes/ManaData/colima-disk/in_use_by"
|
||||
if [ -L "$LOCK" ] && ! pgrep -f "limactl hostagent" >/dev/null 2>&1; then
|
||||
log " removing stale disk lock: $LOCK"
|
||||
rm -f "$LOCK"
|
||||
fi
|
||||
|
||||
log "Starting Colima..."
|
||||
|
||||
# Clear stale process state from hard shutdown (stop only, never delete — delete wipes all images)
|
||||
colima stop --force 2>/dev/null || true
|
||||
sleep 2
|
||||
|
||||
colima start \
|
||||
--cpu 8 \
|
||||
--memory 12 \
|
||||
|
|
@ -74,8 +93,9 @@ else
|
|||
--mount /Volumes/ManaData:w \
|
||||
2>&1 | tee -a "$LOG_FILE"
|
||||
|
||||
if ! colima status 2>/dev/null | grep -q "running"; then
|
||||
log "ERROR: Colima failed to start"
|
||||
# Verify with docker info, not colima status (more reliable)
|
||||
if ! docker info >/dev/null 2>&1; then
|
||||
log "ERROR: Colima failed to start (docker not reachable)"
|
||||
exit 1
|
||||
fi
|
||||
log "Colima started successfully"
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue