Skip to content

Releases: dstackai/dstack

0.20.10

19 Feb 12:42
008efc8

Choose a tag to compare

Services

Prefill-Decode disaggregation

dstack now supports disaggregated Prefill–Decode inference, allowing both Prefill and Decode worker types to run within a single service.

To define and run such a service, set pd_disaggregation to true under the router property (this requires the gateway to use the sglang router, and define separate replica groups for Prefill and Decode worker types:

type: service
name: prefill-decode

env:
  - HF_TOKEN
  - MODEL_ID=zai-org/GLM-4.5-Air-FP8

image: lmsysorg/sglang:latest

replicas:
  - count: 1..4
    scaling:
      metric: rps
      target: 3
    commands:
      - |
          python -m sglang.launch_server \
            --model-path $MODEL_ID \
            --disaggregation-mode prefill \
            --disaggregation-transfer-backend mooncake \
            --host 0.0.0.0 \
            --port 8000 \
            --disaggregation-bootstrap-port 8998
    resources:
      gpu: H200

  - count: 1..8
    scaling:
      metric: rps
      target: 2
    commands:
      - |
          python -m sglang.launch_server \
            --model-path $MODEL_ID \
            --disaggregation-mode decode \
            --disaggregation-transfer-backend mooncake \
            --host 0.0.0.0 \
            --port 8000
    resources:
      gpu: H200

port: 8000
model: zai-org/GLM-4.5-Air-FP8

probes:
  - type: http
    url: /health_generate
    interval: 15s

router:
  type: sglang
  pd_disaggregation: true

Note

Note, pd_disaggregation requires both the gateway and replicas to use the same cluster. With dstack, this can now be used with the aws, gcp, kubernetes backends (as they support creating both clusters and gateways). Support for more backends (and eventually SSH fleets) is coming soon.

Currently, pd_disaggregation works only with SGLang. Support for vLLM is coming soon.

Support for additional scaling metrics, such as TTFT and ITL, is also coming soon to enable autoscaling of Prefill and Decode workers.

Model endpoint

If you configure the model property, dstack previously provided a global model endpoint at gateway.<gateway domain> (or /proxy/models/<project name>), allowing access to all models deployed in the project. This endpoint has been deprecated.

Now, any deployed model should be accessed via the service endpoint itself at <run name>.<gateway domain> (or /proxy/services/main/<service name>).

Note

If you configure the model property, dstack automatically enables CORS on the service endpoint. Future versions will allow you to disable or customize this behavior.

CLI

dstack apply

Previously, if you did not specify gpu, dstack treated it as 0..1 but did not display it in the run plan. Now, dstack properly displays this default. Additionally, if you do not specify image, dstack automatically defaults the vendor to nvidia.

dstack apply -f dev.dstack.yml
 Project              peterschmidt85
 User                 peterschmidt85
 Type                 dev-environment
 Resources            cpu=2.. mem=8GB.. disk=100GB.. gpu=0..
 Spot policy          on-demand
 Max price            off
 Retry policy         off
 Idle duration        5m
 Max duration         off
 Inactivity duration  off

 #  BACKEND         RESOURCES                  INSTANCE TYPE  PRICE
 1  verda (FIN-01)  cpu=4 mem=16GB disk=100GB  CPU.4V.16G     $0.0279
 2  verda (FIN-02)  cpu=4 mem=16GB disk=100GB  CPU.4V.16G     $0.0279
 3  verda (FIN-03)  cpu=4 mem=16GB disk=100GB  CPU.4V.16G     $0.0279
    ...

Submit the run dev? [y/n]: 

This makes the run plan much more explicit and clear.

What's changed

Full changelog: 0.20.9...0.20.10

0.20.9

12 Feb 13:12
c4ed6ca

Choose a tag to compare

Events

UI

In the UI, both the Project and User pages now have an Events tab, providing a convenient way to track events without manually using the global filters.

On the User page, the Events tab shows events where the current user is either the Actor (the one who initiated the operation) or the Target user (the user the command was applied to):

On the Project page, the Events tab shows all events within the current project.

CLI

dstack attach

The dstack attach command now waits until the run is provisioned (similar to dstack apply), shows live progress, and attaches only after the run reaches the running state.

In addition, if a task defines ports and any of those ports cannot be forwarded to localhost (for example, because the port is already in use), both dstack attach and dstack apply now show a clear error message with a -p suggestion:

Failed to attach: port 8000 is already in use. Use -p in dstack attach to override the local port mapping, e.g. -p 8001:8000.

Kubernetes

Resources and offers

The way the kubernetes backend fetches offers has been updated. Previously, the offers reflected the node resources. Now, dstack returns only the offers that satisfy the requested range at its minimum value; for example, if you request gpu: 0..8, dstack returns only offers with gpu: 0. This makes the displayed offers closer to how runs are actually provisioned by Kubernetes.

dstack offer -b kubernetes --gpu 0..8 will return only offers with gpu: 0.

To see offers with gpu: 1, you must pass gpu: 1 or gpu: 1.. to dstack offer or dstack apply.

Note

We understand that this differs from how offers are shown for other backends, but this is the first step in improving how the kubernetes backend does provisioning. Feedback is welcome.

Proxy jump

To proxy SSH traffic inside containers, the kubernetes backend creates a proxy jump pod on startup. This requires at least one cluster node to have an external IP and relies on Kubernetes to forward this traffic even if the proxy jump pod is not running on the node with the external IP.

However, not all Kubernetes services support this behavior; for example, Nebius's Managed Kubernetes requires the proxy jump pod to run on a node with an external IP. To support these cases, the kubernetes backend now double-checks that the proxy jump pod is created correctly.

Note

The most reliable approach in such environments is still to ensure that all cluster nodes have an external IP. Feedback is welcome.

Fleets

Instances in SSH fleets are no longer automatically terminated when they become unreachable over SSH. This prevents premature termination of SSH fleet instances due to transient SSH connectivity issues.

Docs

The reference pages for .dstack.yml configurations now include more information on supported types for every property, making them more useful.

What's changed

Full changelog: 0.20.8...0.20.9

0.20.8

05 Feb 11:46
3149be8

Choose a tag to compare

CLI

dstack event --watch

The dstack event command now supports a --watch option for real-time event tracking.

video

Event coverage has also been improved, with events for run in-place update and service registration now available.

dstack fleet

The dstack fleet command now includes fleet-level information such as nodes, resources, spot policy, and backend details, with individual instances listed underneath.

dstack-fleet

Skills

SKILL.md

If you're using agents such as Claude Code, Codex, Cursor, etc., it’s now possible to install dstack skills.

npx skills add dstackai/dstack

These skills make the agent fully aware of the configuration syntax and CLI commands.

Screenshot 2026-02-05 at 11 54 18

Services

Probes

UI

The UI now displays probe statuses for services, helping monitor replica readiness and health.

ui-probes

until_ready

A new until_ready option for probes allows stopping probe execution once the ready_after threshold is reached. This is useful for resource-intensive probes that only need to run during startup:

probes:
  - type: http
     url: /health
     until_ready: true
     ready_after: 2

Model probes

Services that use the model property to declare a chat model with an OpenAI-compatible interface now receive an automatically configured probe that checks model availability by requesting /v1/chat/completions.

Backends

RunPod

Community Cloud

RunPod Community Cloud is now disabled by default to ensure a more reliable experience. You can still enable Community Cloud in the backend settings. dstack Sky users can enable Community Cloud only when using their own RunPod credentials.

CUDO

Due to CUDO Compute winding down its public on-demand offering, the cudo backend is now deprecated.

What's changed

Full changelog: 0.20.7...0.20.8

0.20.7

28 Jan 16:48
763092d

Choose a tag to compare

Services

Replica groups

A service can now include multiple replica groups. Each group can define its own commands, resources spec, and scaling rules.

type: service
name: llama-8b-service

image: lmsysorg/sglang:latest
env:
  - MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Llama-8B

replicas:
  - count: 1..2
    scaling:
      metric: rps
      target: 10
    commands:
      - |
        python -m sglang.launch_server \
          --model-path $MODEL_ID \
          --port 8000 \
          --trust-remote-code
    resources:
      gpu: 48GB

  - count: 1..4
    scaling:
      metric: rps
      target: 5
    commands:
      - |
        python -m sglang.launch_server \
          --model-path $MODEL_ID \
          --port 8000 \
          --trust-remote-code
    resources:
      gpu: 24GB

port: 8000
model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Note

Properties such as regions, port, image, env and some other cannot be configured per replica group. This support is coming soon.

Note

Native support for disaggregated prefill and decode, allowing both worker types to run within a single service, is coming soon.

Events

Events are now also supported for volumes, gateways, and secrets.

$ dstack event --target-gateway my-gateway
[2026-01-28 11:53:03] [👤admin] [gateway my-gateway] Gateway created. Status: SUBMITTED
[2026-01-28 11:53:32] [gateway my-gateway] Gateway status changed SUBMITTED -> PROVISIONING
[2026-01-28 11:54:46] [gateway my-gateway] Gateway status changed PROVISIONING -> RUNNING
[2026-01-28 11:55:08] [👤admin] [gateway my-gateway] Gateway set as default

Instance events now also include reachability and health events.

Finally, we have added Events under Concepts in the documentation.

CLI

dstack project

The dstack project and dstack project set-default commands now allow you to interactively select the default project when these commands are run without arguments.

dstack-cli-project

dstack login

The dstack login command can now be run without arguments. In this case, it will interactively ask for the URL and provider if needed. If you want to use dstack Sky, you can simply press Enter without entering a URL or provider.

dstack-cli-login

Also, if you have multiple projects, the command will prompt you to select the default project as well.

What's changed

Full changelog: 0.20.6...0.20.7

0.20.6

21 Jan 13:31
f09d061

Choose a tag to compare

Server deployment

Memory optimization

This release reduces peak server memory usage. Previously, memory grew with the total number of instances ever submitted; this is now fixed. We recommend upgrading if memory usage increases over time.

Logs storage

Fluent Bit + Elasticsearch/OpenSearch

Run logs can now be stored in your own log storage via Fluent Bit. At the same time, dstack can now read run logs from Elasticsearch/OpenSearch (to display in the UI and CLI), if Fluent Bit ships the logs there.

See the docs for more details.

Fleets

Since 0.20, dstack requires at least one fleet to be created before you can submit any runs. To make this easier, we’ve simplified default fleet creation during project setup in the UI:

In addition, if your project doesn’t have a fleet, the UI will prompt you to create one.

What's Changed

Full changelog: 0.20.3...0.20.6

0.20.5

21 Jan 11:31
6d14aad

Choose a tag to compare

Warning

Be sure to update to 0.20.6, which includes important fixes.

What's Changed

Full Changelog: 0.20.4...0.20.5

0.20.4

21 Jan 10:45
32fbc02

Choose a tag to compare

Warning

Be sure to update to 0.20.6, which includes important fixes.

What's changed

Full changelog: 0.20.3...0.20.4

0.20.3

08 Jan 18:03
d48b15f

Choose a tag to compare

Dev environments

Windsurf IDE

Dev environments now support Windsurf as a first-class IDE option alongside VSCode and Cursor.

type: dev-environment
ide: windsurf

repos:
- https://github.com/dstackai/dstack

resources:
  gpu: 24GB..:1

dstack provisions an instance for your dev environment and seamlessly connects your local Windsurf editor to it.

dstack-windsurf-dev-environment-min

Troubleshooting

Runs/fleets/volumes/gateways JSON via CLI

You can now inspect the full JSON state of runs, fleets, volumes, and gateways using these CLI commands:

$ dstack run get <name> --json
$ dstack fleet get <name> --json
$ dstack volume get <name> --json
$ dstack gateway get <name> --json

Runs/fleets JSON via UI

The UI includes new "Inspect" tabs with read-only JSON viewers for runs and fleets, making it easier to debug and understand resource states.

dstack-inspect-ui-min

What's changed

Full Changelog: 0.20.2...0.20.3

0.20.2

30 Dec 10:22
3e931d9

Choose a tag to compare

What's Changed

Full Changelog: 0.20.1...0.20.2

0.20.1

25 Dec 15:03
178abdc

Choose a tag to compare

CLI

No-fleets warning

Since the last major release, fleets are required before submitting runs. This update makes that requirement explicit in the CLI.

Screenshot 2025-12-25 at 15 39 00

When a run is submitted for a project that has no fleets, the CLI now shows a dedicated warning. The run status has also been updated in both the CLI and UI to No fleets instead of No offers.

This removes ambiguity around failed runs that previously appeared as No offers.

dstack login

If you're using dstack Sky or dstack Enterprise, you can now authenticate the CLI using a new command, dstack login, instead of manually providing a token.

Screenshot 2025-12-25 at 15 42 30

dstack Sky supports authentication via GitHub. dstack Enterprise supports SSO with providers such as Okta, Microsoft Entra ID, and Google.

Services

Service configurations now support gateway: true.

For services that require gateway features (such as auto-scaling, custom domains, WebSockets, etc), this property makes the requirement explicit. When set, dstack ensures a default gateway is present.

dstack-shim

In addition to the dstack-runner auto-update mechanism introduced in 0.20.0, dstack-shim now also supports auto-updating.

See contributing/RUNNER-AND-SHIM.md for details.

What's changed

Full changelog: 0.20.0...0.20.1