Colab: Failed to create sample: ❌ Error generating from formatted prompt: FlashAttention only supports Ampere GPUs or newer.

⏳ Starting Model Download... (This prevents timeouts)
Initializing download sequence...
2026-02-23 17:09:13.386103: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1771866553.605067    1180 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1771866553.667298    1180 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1771866554.107458    1180 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771866554.107494    1180 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771866554.107499    1180 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771866554.107505    1180 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2026-02-23 17:09:14.150921: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
Downloading ACE-Step v1.5 Unified Repo (This includes DiT and LLM)...
This may take a while (approx 10GB)...
2026-02-23 17:09:22.359 | INFO     | acestep.handler:_ensure_model_downloaded:168 - Model /content/Ace-Step-v1.5 already exists at /content/Ace-Step-v1.5

✅ Download complete.
============================================================
🚀 Models ready. Launching ACE-Step Interface...
🔗 Click the public link ending in 'gradio.live' below once it appears.
============================================================
2026-02-23 17:09:34.897478: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1771866574.935153    1344 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1771866574.948791    1344 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1771866574.977707    1344 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771866574.977743    1344 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771866574.977752    1344 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771866574.977759    1344 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2026-02-23 17:09:34.984878: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
Using local storage (non-persistent): /content/Ace-Step-v1.5/data
Note: To enable persistent storage, configure it in HuggingFace Space settings
Detected GPU memory: 14.56 GB (< 16GB)
Auto-enabling CPU offload to reduce GPU memory usage
Creating handlers...
Service mode configuration:
  DiT model 1: acestep-v15-turbo
  LM model: acestep-5Hz-lm-1.7B
  Backend: vllm
  Offload to CPU: True
  DEBUG_UI: False
  ZeroGPU: False
  Flash Attention: False
Initializing DiT model 1: acestep-v15-turbo...
2026-02-23 17:09:45.821 | INFO     | acestep.handler:_ensure_model_downloaded:181 - Downloading unified repository ACE-Step/Ace-Step1.5 to /content/Ace-Step-v1.5/data/checkpoints...
Fetching 28 files:   0% 0/28 [00:00<?, ?steps/s]
.gitattributes: 1.74kB [00:00, 1.92MB/s]
Fetching 28 files:   4% 1/28 [00:00<00:04,  5.78steps/s]
added_tokens.json: 100% 707/707 [00:00<00:00, 5.72MB/s]

merges.txt: 1.67MB [00:00, 50.0MB/s]

chat_template.jinja: 4.12kB [00:00, 9.38MB/s]

Qwen3-Embedding-0.6B/tokenizer.json:   0% 0.00/11.4M [00:00<?, ?B/s]

special_tokens_map.json: 100% 613/613 [00:00<00:00, 4.56MB/s]


config.json: 1.36kB [00:00, 6.17MB/s]
Fetching 28 files:  14% 4/28 [00:00<00:01, 16.44steps/s]

tokenizer_config.json: 5.40kB [00:00, 25.3MB/s]


vocab.json: 0.00B [00:00, ?B/s]


Qwen3-Embedding-0.6B/model.safetensors:   0% 0.00/1.19G [00:00<?, ?B/s]



README.md: 5.50kB [00:00, 5.78MB/s]
vocab.json: 2.78MB [00:00, 65.1MB/s]


added_tokens.json: 0.00B [00:00, ?B/s]



added_tokens.json: 2.22MB [00:00, 101MB/s]
chat_template.jinja: 4.17kB [00:00, 8.11MB/s]


config.json: 1.39kB [00:00, 4.93MB/s]


merges.txt: 1.67MB [00:00, 49.8MB/s]


special_tokens_map.json: 1.82MB [00:00, 413MB/s]


vocab.json: 0.00B [00:00, ?B/s]



vocab.json: 2.78MB [00:00, 73.9MB/s]


config.json: 1.97kB [00:00, 10.8MB/s]


acestep-5Hz-lm-1.7B/tokenizer.json:   0% 0.00/24.3M [00:00<?, ?B/s]




acestep-5Hz-lm-1.7B/tokenizer_config.jso(…):   0% 0.00/14.1M [00:00<?, ?B/s]





configuration_acestep_v15.py: 13.1kB [00:00, 46.1MB/s]






modeling_acestep_v15_turbo.py: 96.0kB [00:00, 198MB/s]






acestep-v15-turbo/model.safetensors:   0% 0.00/4.79G [00:00<?, ?B/s]






acestep-v15-turbo/silence_latent.pt:   0% 0.00/3.84M [00:00<?, ?B/s]







config.json: 1.97kB [00:00, 10.6MB/s]








config.json: 100% 425/425 [00:00<00:00, 2.93MB/s]





acestep-5Hz-lm-1.7B/tokenizer_config.jso(…): 100% 14.1M/14.1M [00:00<00:00, 30.2MB/s]

Qwen3-Embedding-0.6B/tokenizer.json: 100% 11.4M/11.4M [00:00<00:00, 14.7MB/s]

vae/diffusion_pytorch_model.safetensors:   0% 0.00/337M [00:00<?, ?B/s]

acestep-5Hz-lm-1.7B/tokenizer.json: 100% 24.3M/24.3M [00:00<00:00, 48.0MB/s]







acestep-v15-turbo/silence_latent.pt: 100% 3.84M/3.84M [00:00<00:00, 10.4MB/s]



Qwen3-Embedding-0.6B/model.safetensors:   0% 4.35M/1.19G [00:00<04:25, 4.47MB/s]


Qwen3-Embedding-0.6B/model.safetensors:   0% 4.81M/1.19G [00:01<05:11, 3.81MB/s]


Qwen3-Embedding-0.6B/model.safetensors:   1% 6.76M/1.19G [00:01<03:40, 5.36MB/s]


Qwen3-Embedding-0.6B/model.safetensors:   2% 25.5M/1.19G [00:01<00:36, 31.8MB/s]


Qwen3-Embedding-0.6B/model.safetensors:   3% 34.1M/1.19G [00:01<00:30, 38.4MB/s]


Qwen3-Embedding-0.6B/model.safetensors:   4% 50.2M/1.19G [00:01<00:27, 42.2MB/s]


Qwen3-Embedding-0.6B/model.safetensors:   7% 78.4M/1.19G [00:02<00:18, 59.3MB/s]


Qwen3-Embedding-0.6B/model.safetensors:   8% 101M/1.19G [00:02<00:13, 78.7MB/s] 
vae/diffusion_pytorch_model.safetensors:   1% 2.09M/337M [00:01<04:57, 1.13MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  10% 114M/1.19G [00:02<00:14, 76.1MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  11% 128M/1.19G [00:02<00:14, 73.8MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:   0% 850k/3.71G [00:02<3:21:28, 307kB/s]


Qwen3-Embedding-0.6B/model.safetensors:  12% 142M/1.19G [00:02<00:13, 77.4MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  13% 156M/1.19G [00:03<00:12, 82.8MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  14% 166M/1.19G [00:03<00:13, 77.5MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  15% 178M/1.19G [00:03<00:12, 78.2MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  17% 197M/1.19G [00:03<00:10, 94.8MB/s]





acestep-v15-turbo/model.safetensors:   0% 799k/4.79G [00:03<6:23:32, 208kB/s]


Qwen3-Embedding-0.6B/model.safetensors:  18% 209M/1.19G [00:04<00:32, 30.5MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  18% 218M/1.19G [00:05<00:42, 22.9MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  21% 244M/1.19G [00:05<00:27, 34.4MB/s]
vae/diffusion_pytorch_model.safetensors:  21% 69.2M/337M [00:09<00:36, 7.39MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  23% 275M/1.19G [00:10<01:16, 11.9MB/s]
vae/diffusion_pytorch_model.safetensors:  40% 136M/337M [00:10<00:12, 16.7MB/s] 
vae/diffusion_pytorch_model.safetensors:  60% 203M/337M [00:10<00:04, 29.3MB/s]
vae/diffusion_pytorch_model.safetensors:  80% 270M/337M [00:10<00:01, 44.2MB/s]





acestep-v15-turbo/model.safetensors:   1% 67.9M/4.79G [00:13<14:26, 5.45MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  24% 284M/1.19G [00:16<02:45, 5.48MB/s]





acestep-v15-turbo/model.safetensors:   3% 135M/4.79G [00:16<08:12, 9.45MB/s] 
vae/diffusion_pytorch_model.safetensors: 100% 337M/337M [00:17<00:00, 19.7MB/s]



Qwen3-Embedding-0.6B/model.safetensors:  24% 289M/1.19G [00:17<02:46, 5.42MB/s]





acestep-v15-turbo/model.safetensors:   3% 137M/4.79G [00:17<09:00, 8.61MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  29% 350M/1.19G [00:18<00:58, 14.3MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:   2% 67.9M/3.71G [00:18<16:07, 3.76MB/s]


Fetching 28 files:  14% 4/28 [00:20<00:01, 16.44steps/s]



acestep-5Hz-lm-1.7B/model.safetensors:   4% 135M/3.71G [00:19<06:59, 8.51MB/s] 


Qwen3-Embedding-0.6B/model.safetensors:  30% 360M/1.19G [00:20<01:21, 10.3MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:   5% 202M/3.71G [00:20<04:11, 13.9MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  36% 429M/1.19G [00:21<00:28, 26.5MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:   9% 336M/3.71G [00:23<02:21, 23.8MB/s]





acestep-v15-turbo/model.safetensors:   6% 271M/4.79G [00:25<05:26, 13.9MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  14% 537M/3.71G [00:26<01:25, 37.0MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  42% 496M/1.19G [00:26<00:40, 17.3MB/s]





acestep-v15-turbo/model.safetensors:   7% 338M/4.79G [00:26<03:59, 18.6MB/s]





acestep-v15-turbo/model.safetensors:   8% 405M/4.79G [00:27<03:03, 23.9MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  16% 604M/3.71G [00:27<01:19, 39.0MB/s]





acestep-v15-turbo/model.safetensors:  11% 539M/4.79G [00:27<01:39, 42.6MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  18% 671M/3.71G [00:28<01:09, 43.7MB/s]





acestep-v15-turbo/model.safetensors:  14% 673M/4.79G [00:28<01:04, 63.5MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  20% 740M/3.71G [00:29<00:55, 53.5MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  42% 502M/1.19G [00:29<00:54, 12.5MB/s]





acestep-v15-turbo/model.safetensors:  15% 740M/4.79G [00:29<00:59, 67.6MB/s]





acestep-v15-turbo/model.safetensors:  18% 874M/4.79G [00:29<00:40, 96.8MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  22% 807M/3.71G [00:30<00:51, 56.1MB/s]





acestep-v15-turbo/model.safetensors:  21% 1.01G/4.79G [00:30<00:31, 120MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  24% 874M/3.71G [00:31<00:48, 58.1MB/s]





acestep-v15-turbo/model.safetensors:  24% 1.14G/4.79G [00:30<00:25, 143MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  43% 511M/1.19G [00:31<01:08, 9.98MB/s]





acestep-v15-turbo/model.safetensors:  25% 1.21G/4.79G [00:33<00:44, 79.6MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  43% 515M/1.19G [00:33<01:26, 7.79MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  25% 941M/3.71G [00:33<01:03, 43.7MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  44% 518M/1.19G [00:34<01:22, 8.14MB/s]





acestep-v15-turbo/model.safetensors:  27% 1.28G/4.79G [00:34<00:46, 75.1MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  45% 541M/1.19G [00:35<00:58, 11.1MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  27% 1.01G/3.71G [00:35<01:02, 43.5MB/s]





acestep-v15-turbo/model.safetensors:  28% 1.34G/4.79G [00:35<00:45, 75.7MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  46% 548M/1.19G [00:35<01:00, 10.6MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  46% 552M/1.19G [00:36<01:00, 10.6MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  33% 1.21G/3.71G [00:36<00:33, 75.3MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  48% 573M/1.19G [00:36<00:36, 17.1MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  49% 582M/1.19G [00:37<00:49, 12.3MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  36% 1.34G/3.71G [00:37<00:30, 77.8MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  50% 594M/1.19G [00:38<00:43, 13.8MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  38% 1.41G/3.71G [00:38<00:30, 75.9MB/s]





acestep-v15-turbo/model.safetensors:  29% 1.41G/4.79G [00:38<01:17, 43.5MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  50% 598M/1.19G [00:41<01:39, 5.97MB/s]





acestep-v15-turbo/model.safetensors:  31% 1.48G/4.79G [00:41<01:34, 35.2MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  50% 600M/1.19G [00:42<01:46, 5.53MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  40% 1.48G/3.71G [00:42<00:50, 44.1MB/s]





acestep-v15-turbo/model.safetensors:  34% 1.61G/4.79G [00:42<00:59, 53.2MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  51% 602M/1.19G [00:44<02:23, 4.10MB/s]





acestep-v15-turbo/model.safetensors:  35% 1.68G/4.79G [00:45<01:15, 41.0MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  51% 606M/1.19G [00:48<04:00, 2.44MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  42% 1.54G/3.71G [00:47<01:18, 27.7MB/s]





acestep-v15-turbo/model.safetensors:  38% 1.81G/4.79G [00:48<01:07, 44.2MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  43% 1.61G/3.71G [00:52<01:29, 23.6MB/s]





acestep-v15-turbo/model.safetensors:  39% 1.88G/4.79G [00:51<01:29, 32.6MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  51% 610M/1.19G [00:52<05:46, 1.68MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  47% 1.75G/3.71G [00:52<00:53, 36.7MB/s]





acestep-v15-turbo/model.safetensors:  41% 1.95G/4.79G [00:52<01:15, 37.4MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  52% 616M/1.19G [00:56<05:34, 1.72MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  49% 1.81G/3.71G [00:56<01:01, 31.1MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  52% 620M/1.19G [00:56<04:32, 2.10MB/s]





acestep-v15-turbo/model.safetensors:  42% 2.01G/4.79G [00:59<02:10, 21.3MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  53% 1.95G/3.71G [01:00<00:55, 31.7MB/s]





acestep-v15-turbo/model.safetensors:  43% 2.08G/4.79G [01:00<01:37, 27.9MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  52% 624M/1.19G [01:01<05:56, 1.59MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  54% 2.01G/3.71G [01:00<00:45, 37.5MB/s]





acestep-v15-turbo/model.safetensors:  46% 2.21G/4.79G [01:00<00:56, 45.4MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  53% 626M/1.19G [01:01<05:22, 1.75MB/s]





acestep-v15-turbo/model.safetensors:  48% 2.28G/4.79G [01:01<00:48, 51.8MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  56% 2.08G/3.71G [01:06<01:05, 24.9MB/s]





acestep-v15-turbo/model.safetensors:  52% 2.48G/4.79G [01:06<00:49, 46.9MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  53% 630M/1.19G [01:06<07:25, 1.26MB/s]





acestep-v15-turbo/model.safetensors:  53% 2.55G/4.79G [01:06<00:41, 53.8MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  59% 697M/1.19G [01:07<00:50, 9.89MB/s]





acestep-v15-turbo/model.safetensors:  55% 2.62G/4.79G [01:06<00:32, 66.2MB/s]





acestep-v15-turbo/model.safetensors:  56% 2.68G/4.79G [01:07<00:27, 77.0MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  58% 2.15G/3.71G [01:07<00:54, 28.8MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  59% 705M/1.19G [01:07<00:47, 10.1MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  60% 2.21G/3.71G [01:07<00:39, 37.7MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  60% 710M/1.19G [01:08<00:45, 10.5MB/s]





acestep-v15-turbo/model.safetensors:  57% 2.75G/4.79G [01:07<00:24, 82.6MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  62% 2.28G/3.71G [01:08<00:30, 46.6MB/s]





acestep-v15-turbo/model.safetensors:  59% 2.82G/4.79G [01:08<00:23, 84.1MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  63% 2.35G/3.71G [01:08<00:23, 57.0MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  60% 715M/1.19G [01:09<00:55, 8.62MB/s]





acestep-v15-turbo/model.safetensors:  60% 2.89G/4.79G [01:09<00:20, 93.6MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  65% 2.42G/3.71G [01:10<00:22, 57.6MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  61% 727M/1.19G [01:10<00:50, 9.14MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  66% 2.44G/3.71G [01:10<00:21, 59.6MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  62% 735M/1.19G [01:10<00:43, 10.5MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  62% 739M/1.19G [01:11<00:40, 11.1MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  67% 2.50G/3.71G [01:11<00:17, 68.1MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  69% 2.57G/3.71G [01:11<00:12, 89.8MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  63% 751M/1.19G [01:11<00:31, 14.2MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  71% 2.64G/3.71G [01:11<00:11, 95.6MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  73% 2.70G/3.71G [01:16<00:29, 34.3MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  63% 754M/1.19G [01:16<01:57, 3.72MB/s]





acestep-v15-turbo/model.safetensors:  62% 2.95G/4.79G [01:16<01:09, 26.3MB/s]





acestep-v15-turbo/model.safetensors:  62% 2.98G/4.79G [01:16<01:00, 29.9MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  75% 2.77G/3.71G [01:17<00:20, 44.9MB/s]





acestep-v15-turbo/model.safetensors:  64% 3.05G/4.79G [01:17<00:43, 40.4MB/s]





acestep-v15-turbo/model.safetensors:  65% 3.11G/4.79G [01:17<00:30, 55.7MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  64% 760M/1.19G [01:17<01:42, 4.20MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  64% 764M/1.19G [01:17<01:24, 5.05MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  65% 771M/1.19G [01:18<01:03, 6.63MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  65% 774M/1.19G [01:18<00:54, 7.63MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  77% 2.84G/3.71G [01:18<00:18, 46.6MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  65% 778M/1.19G [01:18<00:49, 8.37MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  66% 781M/1.19G [01:18<00:46, 8.79MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  66% 787M/1.19G [01:19<00:31, 12.8MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  66% 791M/1.19G [01:19<00:26, 14.9MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  67% 795M/1.19G [01:19<00:26, 14.8MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  67% 801M/1.19G [01:22<01:23, 4.69MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  68% 805M/1.19G [01:22<01:15, 5.14MB/s]





acestep-v15-turbo/model.safetensors:  66% 3.18G/4.79G [01:22<00:58, 27.3MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  78% 2.91G/3.71G [01:23<00:28, 27.9MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  80% 2.97G/3.71G [01:23<00:19, 38.5MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  68% 811M/1.19G [01:23<01:00, 6.32MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  69% 824M/1.19G [01:23<00:31, 11.6MB/s]





acestep-v15-turbo/model.safetensors:  68% 3.25G/4.79G [01:23<00:45, 34.1MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  69% 828M/1.19G [01:23<00:30, 12.1MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  82% 3.04G/3.71G [01:23<00:14, 47.0MB/s]





acestep-v15-turbo/model.safetensors:  69% 3.32G/4.79G [01:24<00:33, 43.3MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  70% 835M/1.19G [01:24<00:26, 13.2MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  71% 844M/1.19G [01:24<00:19, 18.3MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  71% 850M/1.19G [01:24<00:14, 23.1MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  73% 865M/1.19G [01:24<00:09, 36.3MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  73% 871M/1.19G [01:25<00:09, 33.7MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  74% 877M/1.19G [01:25<00:08, 36.2MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  84% 3.11G/3.71G [01:25<00:14, 42.9MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  74% 886M/1.19G [01:26<00:20, 14.7MB/s]





acestep-v15-turbo/model.safetensors:  71% 3.38G/4.79G [01:26<00:35, 39.2MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  75% 895M/1.19G [01:26<00:14, 20.5MB/s]





acestep-v15-turbo/model.safetensors:  72% 3.45G/4.79G [01:26<00:25, 52.7MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  76% 903M/1.19G [01:26<00:13, 21.0MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  86% 3.17G/3.71G [01:26<00:11, 47.6MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  77% 915M/1.19G [01:27<00:09, 29.3MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  77% 921M/1.19G [01:27<00:08, 32.0MB/s]





acestep-v15-turbo/model.safetensors:  73% 3.52G/4.79G [01:26<00:20, 62.6MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  87% 3.24G/3.71G [01:27<00:08, 57.9MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  78% 929M/1.19G [01:27<00:10, 23.9MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  79% 937M/1.19G [01:27<00:08, 29.3MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  89% 3.31G/3.71G [01:27<00:05, 73.7MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  79% 945M/1.19G [01:27<00:07, 33.9MB/s]





acestep-v15-turbo/model.safetensors:  75% 3.58G/4.79G [01:28<00:19, 61.1MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  81% 961M/1.19G [01:28<00:07, 32.3MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  91% 3.37G/3.71G [01:28<00:04, 82.4MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  82% 971M/1.19G [01:28<00:05, 36.9MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  83% 984M/1.19G [01:28<00:04, 42.0MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  83% 995M/1.19G [01:29<00:03, 50.3MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  84% 1.00G/1.19G [01:29<00:03, 47.7MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  88% 1.05G/1.19G [01:29<00:01, 108MB/s] 


Qwen3-Embedding-0.6B/model.safetensors:  89% 1.07G/1.19G [01:29<00:01, 93.0MB/s]





acestep-v15-turbo/model.safetensors:  76% 3.65G/4.79G [01:29<00:19, 58.5MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  90% 1.08G/1.19G [01:31<00:03, 29.5MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  92% 1.09G/1.19G [01:31<00:02, 37.6MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  93% 1.10G/1.19G [01:31<00:01, 44.6MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  93% 1.11G/1.19G [01:31<00:01, 49.7MB/s]


Qwen3-Embedding-0.6B/model.safetensors:  94% 1.12G/1.19G [01:31<00:01, 53.3MB/s]





acestep-v15-turbo/model.safetensors:  78% 3.72G/4.79G [01:31<00:23, 45.6MB/s]





acestep-v15-turbo/model.safetensors:  79% 3.78G/4.79G [01:34<00:29, 33.5MB/s]





acestep-v15-turbo/model.safetensors:  80% 3.85G/4.79G [01:35<00:20, 46.1MB/s]





acestep-v15-turbo/model.safetensors:  82% 3.92G/4.79G [01:35<00:14, 60.3MB/s]





acestep-v15-turbo/model.safetensors:  83% 3.99G/4.79G [01:35<00:09, 80.7MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  93% 3.44G/3.71G [01:35<00:11, 23.5MB/s]


Qwen3-Embedding-0.6B/model.safetensors: 100% 1.19G/1.19G [01:36<00:00, 12.3MB/s]
Fetching 28 files:  21% 6/28 [01:36<07:43, 21.07s/steps]



acestep-5Hz-lm-1.7B/model.safetensors:  95% 3.51G/3.71G [01:36<00:06, 31.6MB/s]





acestep-v15-turbo/model.safetensors:  85% 4.05G/4.79G [01:36<00:10, 72.8MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  96% 3.57G/3.71G [01:37<00:03, 39.3MB/s]



acestep-5Hz-lm-1.7B/model.safetensors:  98% 3.64G/3.71G [01:37<00:01, 54.3MB/s]





acestep-v15-turbo/model.safetensors:  86% 4.12G/4.79G [01:37<00:08, 77.3MB/s]





acestep-v15-turbo/model.safetensors:  87% 4.19G/4.79G [01:37<00:05, 102MB/s] 



acestep-5Hz-lm-1.7B/model.safetensors: 100% 3.71G/3.71G [01:38<00:00, 60.1MB/s]





acestep-5Hz-lm-1.7B/model.safetensors: 100% 3.71G/3.71G [01:38<00:00, 37.8MB/s]
Fetching 28 files:  57% 16/28 [01:38<01:03,  5.27s/steps]





acestep-v15-turbo/model.safetensors:  90% 4.32G/4.79G [01:38<00:03, 133MB/s]





acestep-v15-turbo/model.safetensors:  92% 4.39G/4.79G [01:38<00:02, 153MB/s]





acestep-v15-turbo/model.safetensors:  93% 4.45G/4.79G [01:38<00:01, 176MB/s]





acestep-v15-turbo/model.safetensors:  94% 4.52G/4.79G [01:40<00:03, 74.0MB/s]





acestep-v15-turbo/model.safetensors:  97% 4.66G/4.79G [01:41<00:01, 120MB/s] 





acestep-v15-turbo/model.safetensors:  99% 4.72G/4.79G [01:41<00:00, 145MB/s]





acestep-v15-turbo/model.safetensors: 100% 4.79G/4.79G [01:41<00:00, 47.1MB/s]
Fetching 28 files: 100% 28/28 [01:42<00:00,  3.65s/steps]
2026-02-23 17:11:28.205 | INFO     | acestep.handler:_ensure_model_downloaded:195 - Repository ACE-Step/Ace-Step1.5 downloaded successfully to /content/Ace-Step-v1.5/data/checkpoints
2026-02-23 17:11:28.205 | INFO     | acestep.handler:initialize_service:463 - [initialize_service] Attempting to load model with attention implementation: sdpa
2026-02-23 17:11:30.030 | INFO     | acestep.handler:initialize_service:486 - [initialize_service] Keeping main model on cuda (persistent)
`torch_dtype` is deprecated! Use `dtype` instead!
Fetching 7 files:   0% 0/7 [00:00<?, ?steps/s]
_ops.py:   0% 0.00/218 [00:00<?, ?B/s]

_ops.py: 100% 218/218 [00:00<00:00, 37.8kB/s]

__init__.py: 1.03kB [00:00, 131kB/s]
flash_attn_config.py: 100% 952/952 [00:00<00:00, 1.19MB/s]

__init__.py: 100% 387/387 [00:00<00:00, 3.27MB/s]
Fetching 7 files:  14% 1/7 [00:00<00:00,  8.38steps/s]
flash_attn_interface.py: 41.3kB [00:00, 91.4MB/s]

metadata.json: 100% 42.0/42.0 [00:00<00:00, 272kB/s]

build/torch29-cxx11-cu128-x86_64-linux/_(…):   0% 0.00/804M [00:00<?, ?B/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):   0% 131k/804M [00:00<50:25, 266kB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):   2% 17.7M/804M [00:00<00:20, 39.3MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):   6% 44.5M/804M [00:00<00:08, 94.6MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):   9% 69.8M/804M [00:00<00:05, 134MB/s] 
build/torch29-cxx11-cu128-x86_64-linux/_(…):  13% 103M/804M [00:00<00:03, 185MB/s] 
build/torch29-cxx11-cu128-x86_64-linux/_(…):  16% 129M/804M [00:01<00:04, 164MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  19% 156M/804M [00:01<00:03, 186MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  24% 189M/804M [00:01<00:02, 218MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  27% 215M/804M [00:01<00:02, 204MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  30% 238M/804M [00:01<00:02, 201MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  34% 272M/804M [00:01<00:02, 233MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  37% 299M/804M [00:01<00:02, 228MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  40% 323M/804M [00:01<00:02, 230MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  44% 354M/804M [00:02<00:01, 249MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  48% 383M/804M [00:02<00:01, 258MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  52% 415M/804M [00:02<00:01, 274MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  55% 444M/804M [00:02<00:01, 243MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  58% 469M/804M [00:02<00:01, 239MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  62% 495M/804M [00:02<00:01, 220MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  65% 524M/804M [00:02<00:01, 231MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  69% 551M/804M [00:02<00:01, 239MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  72% 577M/804M [00:02<00:00, 240MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  75% 604M/804M [00:03<00:00, 240MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  79% 636M/804M [00:03<00:00, 260MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  83% 667M/804M [00:03<00:00, 256MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…):  92% 740M/804M [00:03<00:00, 262MB/s]
build/torch29-cxx11-cu128-x86_64-linux/_(…): 100% 804M/804M [00:03<00:00, 218MB/s]
Fetching 7 files: 100% 7/7 [00:03<00:00,  1.80steps/s]
2026-02-23 17:11:51.748 | INFO     | acestep.handler:initialize_service:574 - [initialize_service] Text encoder loaded with kernels-community/flash-attn3
DiT model 1 initialized successfully
Initializing 5Hz LM: acestep-5Hz-lm-1.7B...
2026-02-23 17:11:51.755 | INFO     | acestep.llm_inference:initialize:374 - [LLM Init Debug] IS_ZEROGPU=False, IS_HUGGINGFACE_SPACE=False
2026-02-23 17:11:51.755 | INFO     | acestep.llm_inference:initialize:375 - [LLM Init Debug] torch.cuda.is_available()=True
2026-02-23 17:11:51.755 | INFO     | acestep.llm_inference:initialize:376 - [LLM Init Debug] device=cuda, offload_to_cpu=True
2026-02-23 17:11:51.755 | INFO     | acestep.llm_inference:initialize:392 - loading 5Hz LM tokenizer... it may take 80~90s
2026-02-23 17:12:22.264 | INFO     | acestep.llm_inference:initialize:396 - 5Hz LM tokenizer loaded successfully in 30.51 seconds
2026-02-23 17:12:22.264 | INFO     | acestep.llm_inference:initialize:400 - Initializing constrained decoding processor...
2026-02-23 17:12:23.755 | WARNING  | acestep.constrained_logits_processor:_precompute_audio_code_tokens:545 - Found 1535 audio code tokens with values outside valid range [0, 63999]
2026-02-23 17:12:28.153 | INFO     | acestep.llm_inference:initialize:407 - Constrained processor initialized in 5.89 seconds
2026-02-23 17:12:30.063 | ERROR    | acestep.llm_inference:_initialize_5hz_lm_vllm:445 - nano-vllm is not installed. Please install it using 'cd acestep/third_parts/nano-vllm && pip install .
2026-02-23 17:12:30.063 | INFO     | acestep.llm_inference:initialize:413 - 5Hz LM status message: ❌ nano-vllm is not installed. Please install it using 'cd acestep/third_parts/nano-vllm && pip install .
2026-02-23 17:12:30.063 | WARNING  | acestep.llm_inference:initialize:418 - vllm initialization failed, falling back to PyTorch backend
2026-02-23 17:12:30.063 | INFO     | acestep.llm_inference:_load_pytorch_model:237 - [LLM Load] Attempting to load model with attention implementation: kernels-community/flash-attn3
Fetching 7 files: 100% 7/7 [00:00<00:00, 19040.29steps/s]
Fetching 7 files: 100% 7/7 [00:00<00:00, 10145.17steps/s]
2026-02-23 17:12:30.935 | INFO     | acestep.llm_inference:_load_pytorch_model:251 - [LLM Load Debug] Model loaded with kernels-community/flash-attn3, initial device: cpu
2026-02-23 17:12:30.946 | INFO     | acestep.llm_inference:_load_pytorch_model:256 - [LLM Load Debug] After .to(), model device: cpu
2026-02-23 17:12:30.951 | INFO     | acestep.llm_inference:_load_pytorch_model:262 - 5Hz LM initialized successfully using PyTorch backend on cuda
5Hz LM initialized successfully
Service initialization completed!
Creating Gradio interface...
Enabling queue for multi-user support...
Launching server on 0.0.0.0:7860...
* Running on local URL:  http://0.0.0.0:7860/
* Running on public URL: https://ddf89fbe115df41a53.gradio.live/

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
2026-02-23 17:16:32.395 | INFO     | acestep.inference:create_sample:914 - [create_sample Debug] Entry: IS_HUGGINGFACE_SPACE=False
2026-02-23 17:16:32.395 | INFO     | acestep.inference:create_sample:915 - [create_sample Debug] torch.cuda.is_available()=True
2026-02-23 17:16:32.396 | INFO     | acestep.inference:create_sample:917 - [create_sample Debug] torch.cuda.current_device()=0
2026-02-23 17:16:32.396 | INFO     | acestep.inference:create_sample:918 - [create_sample Debug] llm_handler.device=cuda, llm_handler.offload_to_cpu=True
2026-02-23 17:16:32.396 | INFO     | acestep.inference:create_sample:921 - [create_sample Debug] Model device: cpu
2026-02-23 17:16:32.396 | INFO     | acestep.llm_inference:create_sample_from_query:1648 - Creating sample from query: una canzone italiana italo disco su fleximan, il supereroe che taglia i pali con la sega circolare p... (instrumental=[], vocal_language=it)
2026-02-23 17:16:32.433 | DEBUG    | acestep.llm_inference:create_sample_from_query:1655 - Formatted prompt for inspiration: <|im_start|>system
# Instruction
Expand the user's input into a more detailed and specific musical description:

<|im_end|>
<|im_start|>user
una canzone italiana italo disco su fleximan, il supereroe che taglia i pali con la sega circolare per farci circolare

instrumental: false<|im_end|>
<|im_start|>assistant

2026-02-23 17:16:32.433 | INFO     | acestep.llm_inference:create_sample_from_query:1664 - Using user-specified language: it
2026-02-23 17:16:32.435 | DEBUG    | acestep.constrained_logits_processor:set_target_duration:1269 - Target duration cleared, no duration constraint
2026-02-23 17:16:32.435 | DEBUG    | acestep.constrained_logits_processor:set_user_metadata:425 - User provided metadata fields: ['language']
2026-02-23 17:16:32.435 | INFO     | acestep.llm_inference:_load_model_context:2379 - [_load_model_context Debug] Entry: offload_to_cpu=True, backend=pt, self.device=cuda
2026-02-23 17:16:32.435 | INFO     | acestep.llm_inference:_load_model_context:2380 - [_load_model_context Debug] torch.cuda.is_available()=True, IS_ZEROGPU=False
2026-02-23 17:16:32.435 | INFO     | acestep.llm_inference:_load_model_context:2385 - [_load_model_context Debug] Model current device: cpu
2026-02-23 17:16:32.435 | INFO     | acestep.llm_inference:_load_model_context:2397 - [_load_model_context Debug] Moving model from CPU to cuda
2026-02-23 17:16:46.959 | INFO     | acestep.llm_inference:_load_model_context:2399 - [_load_model_context Debug] Model now on: cuda:0
2026-02-23 17:16:46.960 | INFO     | acestep.llm_inference:_load_model_context:2417 - Loading LLM to cuda
2026-02-23 17:16:46.966 | INFO     | acestep.llm_inference:_load_model_context:2422 - Loaded LLM to cuda in 0.0064s
2026-02-23 17:16:46.966 | INFO     | acestep.llm_inference:_run_pt_single:645 - [_run_pt_single Debug] Inputs moved to model device: cuda:0
2026-02-23 17:16:46.966 | INFO     | acestep.llm_inference:_run_pt_single:646 - [_run_pt_single Debug] Input actual device: cuda:0
2026-02-23 17:16:47.626 | INFO     | acestep.llm_inference:_load_model_context:2428 - Offloading LLM to CPU
2026-02-23 17:16:50.540 | INFO     | acestep.llm_inference:_load_model_context:2434 - Offloaded LLM to CPU in 2.9136s
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/gradio/queueing.py", line 766, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/route_utils.py", line 355, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/blocks.py", line 2147, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/blocks.py", line 1641, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 859, in async_iteration
    return await anext(iterator)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 850, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/anyio/to_thread.py", line 63, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/anyio/_backends/_asyncio.py", line 2502, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/anyio/_backends/_asyncio.py", line 986, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 833, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 1017, in gen_wrapper
    response = next(iterator)
               ^^^^^^^^^^^^^^
  File "/content/Ace-Step-v1.5/acestep/gradio_ui/events/__init__.py", line 716, in generation_wrapper
    raise gr.Error(f"Failed to create sample: {result.status_message}")
gradio.exceptions.Error: 'Failed to create sample: ❌ Error generating from formatted prompt: FlashAttention only supports Ampere GPUs or newer.'
2026-02-23 17:17:19.833 | INFO     | acestep.inference:create_sample:914 - [create_sample Debug] Entry: IS_HUGGINGFACE_SPACE=False
2026-02-23 17:17:19.833 | INFO     | acestep.inference:create_sample:915 - [create_sample Debug] torch.cuda.is_available()=True
2026-02-23 17:17:19.834 | INFO     | acestep.inference:create_sample:917 - [create_sample Debug] torch.cuda.current_device()=0
2026-02-23 17:17:19.834 | INFO     | acestep.inference:create_sample:918 - [create_sample Debug] llm_handler.device=cuda, llm_handler.offload_to_cpu=True
2026-02-23 17:17:19.834 | INFO     | acestep.inference:create_sample:921 - [create_sample Debug] Model device: cpu
2026-02-23 17:17:19.834 | INFO     | acestep.llm_inference:create_sample_from_query:1648 - Creating sample from query: una canzone italiana italo disco su fleximan, il supereroe che taglia i pali con la sega circolare p... (instrumental=[], vocal_language=it)
2026-02-23 17:17:19.838 | DEBUG    | acestep.llm_inference:create_sample_from_query:1655 - Formatted prompt for inspiration: <|im_start|>system
# Instruction
Expand the user's input into a more detailed and specific musical description:

<|im_end|>
<|im_start|>user
una canzone italiana italo disco su fleximan, il supereroe che taglia i pali con la sega circolare per farci circolare

instrumental: false<|im_end|>
<|im_start|>assistant

2026-02-23 17:17:19.839 | INFO     | acestep.llm_inference:create_sample_from_query:1664 - Using user-specified language: it
2026-02-23 17:17:19.844 | DEBUG    | acestep.constrained_logits_processor:set_target_duration:1269 - Target duration cleared, no duration constraint
2026-02-23 17:17:19.845 | DEBUG    | acestep.constrained_logits_processor:set_user_metadata:425 - User provided metadata fields: ['language']
2026-02-23 17:17:19.845 | INFO     | acestep.llm_inference:_load_model_context:2379 - [_load_model_context Debug] Entry: offload_to_cpu=True, backend=pt, self.device=cuda
2026-02-23 17:17:19.845 | INFO     | acestep.llm_inference:_load_model_context:2380 - [_load_model_context Debug] torch.cuda.is_available()=True, IS_ZEROGPU=False
2026-02-23 17:17:19.845 | INFO     | acestep.llm_inference:_load_model_context:2385 - [_load_model_context Debug] Model current device: cpu
2026-02-23 17:17:19.845 | INFO     | acestep.llm_inference:_load_model_context:2397 - [_load_model_context Debug] Moving model from CPU to cuda
2026-02-23 17:17:21.157 | INFO     | acestep.llm_inference:_load_model_context:2399 - [_load_model_context Debug] Model now on: cuda:0
2026-02-23 17:17:21.158 | INFO     | acestep.llm_inference:_load_model_context:2417 - Loading LLM to cuda
2026-02-23 17:17:21.170 | INFO     | acestep.llm_inference:_load_model_context:2422 - Loaded LLM to cuda in 0.0125s
2026-02-23 17:17:21.171 | INFO     | acestep.llm_inference:_run_pt_single:645 - [_run_pt_single Debug] Inputs moved to model device: cuda:0
2026-02-23 17:17:21.171 | INFO     | acestep.llm_inference:_run_pt_single:646 - [_run_pt_single Debug] Input actual device: cuda:0
2026-02-23 17:17:21.177 | INFO     | acestep.llm_inference:_load_model_context:2428 - Offloading LLM to CPU
2026-02-23 17:17:22.901 | INFO     | acestep.llm_inference:_load_model_context:2434 - Offloaded LLM to CPU in 1.7220s
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/gradio/queueing.py", line 766, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/route_utils.py", line 355, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/blocks.py", line 2147, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/blocks.py", line 1641, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 859, in async_iteration
    return await anext(iterator)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 850, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/anyio/to_thread.py", line 63, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/anyio/_backends/_asyncio.py", line 2502, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/anyio/_backends/_asyncio.py", line 986, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 833, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 1017, in gen_wrapper
    response = next(iterator)
               ^^^^^^^^^^^^^^
  File "/content/Ace-Step-v1.5/acestep/gradio_ui/events/__init__.py", line 716, in generation_wrapper
    raise gr.Error(f"Failed to create sample: {result.status_message}")
gradio.exceptions.Error: 'Failed to create sample: ❌ Error generating from formatted prompt: FlashAttention only supports Ampere GPUs or newer.'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Colab: Failed to create sample: ❌ Error generating from formatted prompt: FlashAttention only supports Ampere GPUs or newer. #393

✅ Download complete.

🚀 Models ready. Launching ACE-Step Interface...
🔗 Click the public link ending in 'gradio.live' below once it appears.

Instruction

Instruction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Colab: Failed to create sample: ❌ Error generating from formatted prompt: FlashAttention only supports Ampere GPUs or newer. #393

Description

✅ Download complete.

🚀 Models ready. Launching ACE-Step Interface... 🔗 Click the public link ending in 'gradio.live' below once it appears.

Instruction

Instruction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

🚀 Models ready. Launching ACE-Step Interface...
🔗 Click the public link ending in 'gradio.live' below once it appears.