Releases: dstackai/dstack
0.20.10
Services
Prefill-Decode disaggregation
dstack now supports disaggregated Prefill–Decode inference, allowing both Prefill and Decode worker types to run within a single service.
To define and run such a service, set pd_disaggregation to true under the router property (this requires the gateway to use the sglang router, and define separate replica groups for Prefill and Decode worker types:
type: service
name: prefill-decode
env:
- HF_TOKEN
- MODEL_ID=zai-org/GLM-4.5-Air-FP8
image: lmsysorg/sglang:latest
replicas:
- count: 1..4
scaling:
metric: rps
target: 3
commands:
- |
python -m sglang.launch_server \
--model-path $MODEL_ID \
--disaggregation-mode prefill \
--disaggregation-transfer-backend mooncake \
--host 0.0.0.0 \
--port 8000 \
--disaggregation-bootstrap-port 8998
resources:
gpu: H200
- count: 1..8
scaling:
metric: rps
target: 2
commands:
- |
python -m sglang.launch_server \
--model-path $MODEL_ID \
--disaggregation-mode decode \
--disaggregation-transfer-backend mooncake \
--host 0.0.0.0 \
--port 8000
resources:
gpu: H200
port: 8000
model: zai-org/GLM-4.5-Air-FP8
probes:
- type: http
url: /health_generate
interval: 15s
router:
type: sglang
pd_disaggregation: trueNote
Note, pd_disaggregation requires both the gateway and replicas to use the same cluster. With dstack, this can now be used with the aws, gcp, kubernetes backends (as they support creating both clusters and gateways). Support for more backends (and eventually SSH fleets) is coming soon.
Currently, pd_disaggregation works only with SGLang. Support for vLLM is coming soon.
Support for additional scaling metrics, such as TTFT and ITL, is also coming soon to enable autoscaling of Prefill and Decode workers.
Model endpoint
If you configure the model property, dstack previously provided a global model endpoint at gateway.<gateway domain> (or /proxy/models/<project name>), allowing access to all models deployed in the project. This endpoint has been deprecated.
Now, any deployed model should be accessed via the service endpoint itself at <run name>.<gateway domain> (or /proxy/services/main/<service name>).
Note
If you configure the model property, dstack automatically enables CORS on the service endpoint. Future versions will allow you to disable or customize this behavior.
CLI
dstack apply
Previously, if you did not specify gpu, dstack treated it as 0..1 but did not display it in the run plan. Now, dstack properly displays this default. Additionally, if you do not specify image, dstack automatically defaults the vendor to nvidia.
dstack apply -f dev.dstack.yml
Project peterschmidt85
User peterschmidt85
Type dev-environment
Resources cpu=2.. mem=8GB.. disk=100GB.. gpu=0..
Spot policy on-demand
Max price off
Retry policy off
Idle duration 5m
Max duration off
Inactivity duration off
# BACKEND RESOURCES INSTANCE TYPE PRICE
1 verda (FIN-01) cpu=4 mem=16GB disk=100GB CPU.4V.16G $0.0279
2 verda (FIN-02) cpu=4 mem=16GB disk=100GB CPU.4V.16G $0.0279
3 verda (FIN-03) cpu=4 mem=16GB disk=100GB CPU.4V.16G $0.0279
...
Submit the run dev? [y/n]: This makes the run plan much more explicit and clear.
What's changed
- [Docs] Nebius example under
Clustersby @peterschmidt85 in #3567 - [Docs] Add get nodes rule to K8s ClusterRole by @un-def in #3571
- [Docs] Clarified the behavior of idle duration: how run's
idle_durationand fleet'sidle_durationare applied by @peterschmidt85 in #3574 - [runner] Don't bind to public addresses by @un-def in #3575
- Migrate service model base url by @peterschmidt85 in #3560
- Set explicit GPU defaults in ResourcesSpec and improve default GPU vendor selection by @peterschmidt85 in #3573
- Add
--verbosetodstack applyand enhance run plan output by @peterschmidt85 in #3572 - Cosmetical changes to the home page (font; headline; etc) by @peterschmidt85 in #3582
- Implement pipeline tasks by @r4victor in #3581
- Add pd disaggregated inference by @Bihan in #3558
- Group db migrations by @r4victor in #3583
- Clarify GPU vendor inference comments (follow-up to #3573) by @peterschmidt85 in #3588
- Kubernetes: gateway: start services via docker-systemctl-replacement by @un-def in #3584
- Remove dangling services from gateway by @jvstme in #3586
- [runner] Check capabilities(7) by @un-def in #3587
- [runner] Check if repo dir exists before chown by @un-def in #3589
Full changelog: 0.20.9...0.20.10
0.20.9
Events
UI
In the UI, both the Project and User pages now have an Events tab, providing a convenient way to track events without manually using the global filters.
On the User page, the Events tab shows events where the current user is either the Actor (the one who initiated the operation) or the Target user (the user the command was applied to):
On the Project page, the Events tab shows all events within the current project.
CLI
dstack attach
The dstack attach command now waits until the run is provisioned (similar to dstack apply), shows live progress, and attaches only after the run reaches the running state.
In addition, if a task defines ports and any of those ports cannot be forwarded to localhost (for example, because the port is already in use), both dstack attach and dstack apply now show a clear error message with a -p suggestion:
Failed to attach: port 8000 is already in use. Use -p in dstack attach to override the local port mapping, e.g. -p 8001:8000.
Kubernetes
Resources and offers
The way the kubernetes backend fetches offers has been updated. Previously, the offers reflected the node resources. Now, dstack returns only the offers that satisfy the requested range at its minimum value; for example, if you request gpu: 0..8, dstack returns only offers with gpu: 0. This makes the displayed offers closer to how runs are actually provisioned by Kubernetes.
dstack offer -b kubernetes --gpu 0..8 will return only offers with gpu: 0.
To see offers with gpu: 1, you must pass gpu: 1 or gpu: 1.. to dstack offer or dstack apply.
Note
We understand that this differs from how offers are shown for other backends, but this is the first step in improving how the kubernetes backend does provisioning. Feedback is welcome.
Proxy jump
To proxy SSH traffic inside containers, the kubernetes backend creates a proxy jump pod on startup. This requires at least one cluster node to have an external IP and relies on Kubernetes to forward this traffic even if the proxy jump pod is not running on the node with the external IP.
However, not all Kubernetes services support this behavior; for example, Nebius's Managed Kubernetes requires the proxy jump pod to run on a node with an external IP. To support these cases, the kubernetes backend now double-checks that the proxy jump pod is created correctly.
Note
The most reliable approach in such environments is still to ensure that all cluster nodes have an external IP. Feedback is welcome.
Fleets
Instances in SSH fleets are no longer automatically terminated when they become unreachable over SSH. This prevents premature termination of SSH fleet instances due to transient SSH connectivity issues.
Docs
The reference pages for .dstack.yml configurations now include more information on supported types for every property, making them more useful.
What's changed
- Events UI #3309 by @olgenn in #3532
- [runner] Write termination_{reason,message} to the log by @un-def in #3550
- Disable autoflush by @r4victor in #3553
- Update SKILL.md with authentication details and OpenAI model usage instructions by @peterschmidt85 in #3554
- Update SKILL.md to standardize run name formatting and add permissions guardrail for
dstack attachby @peterschmidt85 in #3555 - Optimize create instance on AWS by @r4victor in #3556
- [UX] Wait for run provisioning in
dstack attach; pretty-print "port in use" error duringdstack applyanddstack attachby @peterschmidt85 in #3562 - Updated schema generation script to improve type handling and user-fr… by @peterschmidt85 in #3563
- Replaced
datacrunchwithverdaby @peterschmidt85 in #3564 - Kubernetes: improve offers by @un-def in #3548
- Streamline
InstanceModel.remote_connection_infohandling by @un-def in #3566 - Kubernetes: rework jump pod provisioning by @un-def in #3561
- Don't terminate unreachable SSH instances by @un-def in #3568
Full changelog: 0.20.8...0.20.9
0.20.8
CLI
dstack event --watch
The dstack event command now supports a --watch option for real-time event tracking.
Event coverage has also been improved, with events for run in-place update and service registration now available.
dstack fleet
The dstack fleet command now includes fleet-level information such as nodes, resources, spot policy, and backend details, with individual instances listed underneath.
Skills
SKILL.md
If you're using agents such as Claude Code, Codex, Cursor, etc., it’s now possible to install dstack skills.
npx skills add dstackai/dstackThese skills make the agent fully aware of the configuration syntax and CLI commands.
Services
Probes
UI
The UI now displays probe statuses for services, helping monitor replica readiness and health.
until_ready
A new until_ready option for probes allows stopping probe execution once the ready_after threshold is reached. This is useful for resource-intensive probes that only need to run during startup:
probes:
- type: http
url: /health
until_ready: true
ready_after: 2Model probes
Services that use the model property to declare a chat model with an OpenAI-compatible interface now receive an automatically configured probe that checks model availability by requesting /v1/chat/completions.
Backends
RunPod
Community Cloud
RunPod Community Cloud is now disabled by default to ensure a more reliable experience. You can still enable Community Cloud in the backend settings. dstack Sky users can enable Community Cloud only when using their own RunPod credentials.
CUDO
Due to CUDO Compute winding down its public on-demand offering, the cudo backend is now deprecated.
What's changed
- [Docs] Replica groups by @Bihan in #3511
- [Docs] Added
Spot policyby @peterschmidt85 in #3512 - Switch UI to pagination-based projects and users API by @olgenn in #3503
- [UI] Add Spot policy configuration option to the fleet wizard by @olgenn in #3519
- Rename event target filters in UI by @jvstme in #3517
- [Docs] Add dstack skill by @peterschmidt85 in #3525
- [Docs] Remove the mention of the gateway endpoint #3514 by @peterschmidt85 in #3518
- Add service and replica registration events by @jvstme in #3516
- [Bug]: Refresh button does not work on list pages by @olgenn in #3520
- [Feature]: Show probe statuses in the UI by @olgenn in #3521
- [Runpod] Make Community Cloud an "opt-in" (disable by default) by @peterschmidt85 in #3534
- [Services] Add default probes if model is set by @peterschmidt85 in #3524
- [Docs] Update SKILL.md by @peterschmidt85 in #3536
- [CLI]:
dstack event --watchby @jvstme in #3533 - Add
/api/project/{project_name}/instances/getby @jvstme in #3535 - Add run in-place update event by @jvstme in #3540
- Add job in-place update event by @jvstme in #3541
- Add probe
until_readyconfiguration option by @jvstme in #3530 - CLI crashes with 'Operation not permitted' when log file is not writable by @peterschmidt85 in #3538
- [Docs] Removed
cudobackend by @peterschmidt85 in #3539 - [UX] Improve
dstack fleetoutput layout by @peterschmidt85 in #3529 - [UX] Remove creation_policy from Concept by @peterschmidt85 in #3542
- [Bug]: Run doesn't show Waiting runner limit exceeded in Error by @peterschmidt85 in #3546
- [Docs] Update SKILL.md by @peterschmidt85 in #3547
- Fix
probes=Noneclient incompatibility by @jvstme in #3544 - Fix
probes=Noneserver incompatibility by @jvstme in #3543
Full changelog: 0.20.7...0.20.8
0.20.7
Services
Replica groups
A service can now include multiple replica groups. Each group can define its own commands, resources spec, and scaling rules.
type: service
name: llama-8b-service
image: lmsysorg/sglang:latest
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B
replicas:
- count: 1..2
scaling:
metric: rps
target: 10
commands:
- |
python -m sglang.launch_server \
--model-path $MODEL_ID \
--port 8000 \
--trust-remote-code
resources:
gpu: 48GB
- count: 1..4
scaling:
metric: rps
target: 5
commands:
- |
python -m sglang.launch_server \
--model-path $MODEL_ID \
--port 8000 \
--trust-remote-code
resources:
gpu: 24GB
port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8BNote
Properties such as regions, port, image, env and some other cannot be configured per replica group. This support is coming soon.
Note
Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.
Events
Events are now also supported for volumes, gateways, and secrets.
$ dstack event --target-gateway my-gateway
[2026-01-28 11:53:03] [👤admin] [gateway my-gateway] Gateway created. Status: SUBMITTED
[2026-01-28 11:53:32] [gateway my-gateway] Gateway status changed SUBMITTED -> PROVISIONING
[2026-01-28 11:54:46] [gateway my-gateway] Gateway status changed PROVISIONING -> RUNNING
[2026-01-28 11:55:08] [👤admin] [gateway my-gateway] Gateway set as defaultInstance events now also include reachability and health events.
Finally, we have added Events under Concepts in the documentation.
CLI
dstack project
The dstack project and dstack project set-default commands now allow you to interactively select the default project when these commands are run without arguments.
dstack login
The dstack login command can now be run without arguments. In this case, it will interactively ask for the URL and provider if needed. If you want to use dstack Sky, you can simply press Enter without entering a URL or provider.
Also, if you have multiple projects, the command will prompt you to select the default project as well.
What's changed
- Implement pagination for
/api/project/listand/api/users/listby @r4victor in #3489 - Update dstack server CLI logo by @r4victor in #3438
- Move pytest.ini options to pyproject.toml by @r4victor in #3491
- [UX] Make
dstack projectanddstack project set-defaultinteractive for default project selection by @peterschmidt85 in #3488 - Add replica groups in dstack-service by @Bihan in #3408
- [chore]: Add
list_eventsutility for unit tests by @jvstme in #3493 - [Docs]: Fix k8s backend config example by @jvstme in #3495
- Move ruff.toml to pyproject.toml by @r4victor in #3496
- Events: instance/job reachability and health by @jvstme in #3482
- Volume events by @jvstme in #3494
- Set INSTANCE_UNREACHABLE for unreachable on-demand instances by @r4victor in #3497
- Support gateway events in API, CLI, and UI by @jvstme in #3499
- Use numeric replica-group names by @Bihan in #3502
- Add gateway lifecycle events by @jvstme in #3500
- Docs minor improvements by @peterschmidt85 in #3501
- Support secret events in API, CLI, and UI by @jvstme in #3504
- [Docs] Events #3397 by @peterschmidt85 in #3506
- [UX] Extend
dstack loginwith interactive selection ofurland default project by @peterschmidt85 in #3492 - Add secret lifecycle events by @jvstme in #3505
- Fix apply plan compatibility with old servers by @jvstme in #3507
- [UI] Minor tweaks by @peterschmidt85 in #3508
- Fix
dstack eventcompat. with older servers by @jvstme in #3509 - Fix scaling during update to replica groups by @jvstme in #3510
Full changelog: 0.20.6...0.20.7
0.20.6
Server deployment
Memory optimization
This release reduces peak server memory usage. Previously, memory grew with the total number of instances ever submitted; this is now fixed. We recommend upgrading if memory usage increases over time.
Logs storage
Fluent Bit + Elasticsearch/OpenSearch
Run logs can now be stored in your own log storage via Fluent Bit. At the same time, dstack can now read run logs from Elasticsearch/OpenSearch (to display in the UI and CLI), if Fluent Bit ships the logs there.
See the docs for more details.
Fleets
Since 0.20, dstack requires at least one fleet to be created before you can submit any runs. To make this easier, we’ve simplified default fleet creation during project setup in the UI:
In addition, if your project doesn’t have a fleet, the UI will prompt you to create one.
What's Changed
- Hotfix. Fixed generation fleet fields in project forms by @olgenn in #3486
- Add missing Box imports by @r4victor in #3485
- Use the same metrics endpoint label for 404 requests by @r4victor in #3455
- Refactoring Inspect page by @olgenn in #3457
- Migrate from Slurm by @peterschmidt85 in #3454
- [Internal]: Handle GitHub API errors in
release_notes.pyby @jvstme in #3463 - Display
InstanceAvailability.NO_BALANCEin CLI by @jvstme in #3460 - Do not return
NO_BALANCEto older clients by @jvstme in #3462 - Optimize job submissions loading by @r4victor in #3466
- [CLI] Add
--memoryoption toapplyandofferby @un-def in #3461 - [runner] Rework and fix user processing by @un-def in #3456
- Optimize fleet instances db queries by @r4victor in #3467
- Kubernetes: adjust offer GPU count by @un-def in #3469
- Add missing job status change event for scaling by @jvstme in #3465
- Fix
find_optimal_fleet_with_offerslog message by @un-def in #3470 - Fix missing instance lock in delete_fleets by @r4victor in #3471
- Optimize list and get fleets by @r4victor in #3472
- feat(logging): add fluent-bit log shipping by @DragonStuff in #3431
- Adjust fluent-bit logging integration by @r4victor in #3478
- Emit events for instance status changes by @jvstme in #3477
- [runner] Restore
--home-diroption as no-op by @un-def in #3480 - [UI] Default fleet in project wizard by @olgenn in #3464
- Support shared AWS compute caches by @r4victor in #3483
- [UI] Minor re-order in the sidebar by @peterschmidt85 in #3484
Full changelog: 0.20.3...0.20.6
0.20.5
0.20.4
Warning
Be sure to update to 0.20.6, which includes important fixes.
What's changed
- Use the same metrics endpoint label for 404 requests by @r4victor in #3455
- Refactoring Inspect page by @olgenn in #3457
- Migrate from Slurm by @peterschmidt85 in #3454
- [Internal]: Handle GitHub API errors in
release_notes.pyby @jvstme in #3463 - Display
InstanceAvailability.NO_BALANCEin CLI by @jvstme in #3460 - Do not return
NO_BALANCEto older clients by @jvstme in #3462 - Optimize job submissions loading by @r4victor in #3466
- [CLI] Add
--memoryoption toapplyandofferby @un-def in #3461 - [runner] Rework and fix user processing by @un-def in #3456
- Optimize fleet instances db queries by @r4victor in #3467
- Kubernetes: adjust offer GPU count by @un-def in #3469
- Add missing job status change event for scaling by @jvstme in #3465
- Fix
find_optimal_fleet_with_offerslog message by @un-def in #3470 - Fix missing instance lock in delete_fleets by @r4victor in #3471
- Optimize list and get fleets by @r4victor in #3472
- feat(logging): add fluent-bit log shipping by @DragonStuff in #3431
- Adjust fluent-bit logging integration by @r4victor in #3478
- Emit events for instance status changes by @jvstme in #3477
- [runner] Restore
--home-diroption as no-op by @un-def in #3480 - [UI] Default fleet in project wizard by @olgenn in #3464
- Support shared AWS compute caches by @r4victor in #3483
- [UI] Minor re-order in the sidebar by @peterschmidt85 in #3484
Full changelog: 0.20.3...0.20.4
0.20.3
Dev environments
Windsurf IDE
Dev environments now support Windsurf as a first-class IDE option alongside VSCode and Cursor.
type: dev-environment
ide: windsurf
repos:
- https://github.com/dstackai/dstack
resources:
gpu: 24GB..:1dstack provisions an instance for your dev environment and seamlessly connects your local Windsurf editor to it.
Troubleshooting
Runs/fleets/volumes/gateways JSON via CLI
You can now inspect the full JSON state of runs, fleets, volumes, and gateways using these CLI commands:
$ dstack run get <name> --json
$ dstack fleet get <name> --json
$ dstack volume get <name> --json
$ dstack gateway get <name> --jsonRuns/fleets JSON via UI
The UI includes new "Inspect" tabs with read-only JSON viewers for runs and fleets, making it easier to debug and understand resource states.
What's changed
- Adjust kubernetes gpu matching for RTX5090 by @r4victor in #3440
- [runner] Fix MPI hostfile by @un-def in #3441
- [Crusoe] Minor edits by @peterschmidt85 in #3448
- [Dev environments] Support windsurf IDE by @peterschmidt85 in #3444
- Add
processing instancedebug log message by @jvstme in #3450 - [runner] Decouple Server and Executor by @un-def in #3447
- [Feature] Allow to see JSON state of runs/volumes/fleets/gateways via CLI/UI by @peterschmidt85 in #3445
Full Changelog: 0.20.2...0.20.3
0.20.2
What's Changed
- Fix TestRemoveDanglingTasks by @un-def in #3426
- [runner] Configure and start sshd by @un-def in #3421
- Resolve url for dstack login by @r4victor in #3427
- [shim] Fix DockerRunner tests by @un-def in #3429
- Remove httpx duplicated in dev deps by @r4victor in #3433
- [UX] Better "No fleets" messages; plus updated
Troubleshootingguide by @peterschmidt85 in #3428 - [runner] Streamline authorized_keys management by @un-def in #3435
- Change /dstack/venv ownership to the current user by @un-def in #3437
- [UX] Add an API that returns projects that lack active fleets by @peterschmidt85 in #3425
- Make no fleet notifications dismissible by @r4victor in #3439
Full Changelog: 0.20.1...0.20.2
0.20.1
CLI
No-fleets warning
Since the last major release, fleets are required before submitting runs. This update makes that requirement explicit in the CLI.
When a run is submitted for a project that has no fleets, the CLI now shows a dedicated warning. The run status has also been updated in both the CLI and UI to No fleets instead of No offers.
This removes ambiguity around failed runs that previously appeared as No offers.
dstack login
If you're using dstack Sky or dstack Enterprise, you can now authenticate the CLI using a new command, dstack login, instead of manually providing a token.
dstack Sky supports authentication via GitHub. dstack Enterprise supports SSO with providers such as Okta, Microsoft Entra ID, and Google.
Services
Service configurations now support gateway: true.
For services that require gateway features (such as auto-scaling, custom domains, WebSockets, etc), this property makes the requirement explicit. When set, dstack ensures a default gateway is present.
dstack-shim
In addition to the dstack-runner auto-update mechanism introduced in 0.20.0, dstack-shim now also supports auto-updating.
See contributing/RUNNER-AND-SHIM.md for details.
What's changed
- [Docs] Reflect the 0.20 changes related to
working_dirandrepo_dirby @peterschmidt85 in #3356 - [Docs]: Fix environment variables reference layout by @jvstme in #3396
- Add more events about users and projects by @jvstme in #3390
- Implement shim auto-update by @un-def in #3395
- [Fleets] Updated error message and docs by @peterschmidt85 in #3377
- [Blog] dstack 0.20 GA: Fleet-first UX and other important changes by @peterschmidt85 in #3401
- [runner] Get container cgroup path from procfs by @un-def in #3402
- [Internal] Add an index for user email by @peterschmidt85 in #3409
- Don't send
asyncio.CancelledErrorto Sentry by @un-def in #3404 - [Internal] Allow passing
AnyActortoupdate_userby @peterschmidt85 in #3410 - Replace
Instance.termination_reasonvalues with codes by @peterschmidt85 in #3187 - [Docs] Added the
Lambdaexample underClustersby @peterschmidt85 in #3407 - [runner] Revamp
main.goby @un-def in #3411 - Was implemented Event list for job, run and fleet by @olgenn in #3392
- Fix event target type rendering in server logs by @jvstme in #3414
- Support
gateway: truein service configurations by @jvstme in #3413 - Implement
dstack logincommand and CLI OAuth flow by @r4victor in #3415 - Allow users to delete their only project by @jvstme in #3416
- Indicate deleted actors and projects in Events API by @jvstme in #3422
- [UX] Make "No fleets" run status more explicit #3405 by @peterschmidt85 in #3406
- Bump
gpuhunt==0.1.16by @r4victor in #3423
Full changelog: 0.20.0...0.20.1
