-
Notifications
You must be signed in to change notification settings - Fork 215
Open
Labels
Description
Steps to reproduce
- Start a
modelservice:
type: service
name: llama31
# If `image` is not specified, dstack uses its default image
python: 3.12
env:
- HF_TOKEN
- MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
- MAX_MODEL_LEN=4096
commands:
- uv pip install vllm
- vllm serve $MODEL_ID
--max-model-len $MAX_MODEL_LEN
--tensor-parallel-size $DSTACK_GPUS_NUM
port: 8000
# (Optional) Register the model
model: meta-llama/Meta-Llama-3.1-8B-Instruct
# Uncomment to leverage spot instances
#spot_policy: auto
resources:
gpu: 24GB
- Before the model is up and running, go to Model UI and try to chat. You'll see errors like "Connection errors" and CORS.
Actual behaviour
No response
Expected behaviour
Disable chat UI until the model is ready (if we know this due to probes).
dstack version
master
Server logs
Additional information
No response
Reactions are currently unavailable