mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 21:41:09 +02:00
- Reduce max-model-len to 4096 for CPU compatibility - Add max-num-batched-tokens matching the context size - Add enforce-eager for stable CPU inference Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| setup-vllm.sh | ||
| start-vllm-voxtral.sh | ||