Skip to content

Colab: Failed to create sample: ❌ Error generating from formatted prompt: FlashAttention only supports Ampere GPUs or newer. #393

@grigio

Description

@grigio

⏳ Starting Model Download... (This prevents timeouts)
Initializing download sequence...
2026-02-23 17:09:13.386103: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1771866553.605067 1180 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1771866553.667298 1180 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1771866554.107458 1180 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771866554.107494 1180 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771866554.107499 1180 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771866554.107505 1180 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2026-02-23 17:09:14.150921: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
Downloading ACE-Step v1.5 Unified Repo (This includes DiT and LLM)...
This may take a while (approx 10GB)...
2026-02-23 17:09:22.359 | INFO | acestep.handler:_ensure_model_downloaded:168 - Model /content/Ace-Step-v1.5 already exists at /content/Ace-Step-v1.5

✅ Download complete.

🚀 Models ready. Launching ACE-Step Interface...
🔗 Click the public link ending in 'gradio.live' below once it appears.

2026-02-23 17:09:34.897478: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1771866574.935153 1344 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1771866574.948791 1344 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1771866574.977707 1344 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771866574.977743 1344 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771866574.977752 1344 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771866574.977759 1344 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2026-02-23 17:09:34.984878: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
Using local storage (non-persistent): /content/Ace-Step-v1.5/data
Note: To enable persistent storage, configure it in HuggingFace Space settings
Detected GPU memory: 14.56 GB (< 16GB)
Auto-enabling CPU offload to reduce GPU memory usage
Creating handlers...
Service mode configuration:
DiT model 1: acestep-v15-turbo
LM model: acestep-5Hz-lm-1.7B
Backend: vllm
Offload to CPU: True
DEBUG_UI: False
ZeroGPU: False
Flash Attention: False
Initializing DiT model 1: acestep-v15-turbo...
2026-02-23 17:09:45.821 | INFO | acestep.handler:_ensure_model_downloaded:181 - Downloading unified repository ACE-Step/Ace-Step1.5 to /content/Ace-Step-v1.5/data/checkpoints...
Fetching 28 files: 0% 0/28 [00:00<?, ?steps/s]
.gitattributes: 1.74kB [00:00, 1.92MB/s]
Fetching 28 files: 4% 1/28 [00:00<00:04, 5.78steps/s]
added_tokens.json: 100% 707/707 [00:00<00:00, 5.72MB/s]

merges.txt: 1.67MB [00:00, 50.0MB/s]

chat_template.jinja: 4.12kB [00:00, 9.38MB/s]

Qwen3-Embedding-0.6B/tokenizer.json: 0% 0.00/11.4M [00:00<?, ?B/s]

special_tokens_map.json: 100% 613/613 [00:00<00:00, 4.56MB/s]

config.json: 1.36kB [00:00, 6.17MB/s]
Fetching 28 files: 14% 4/28 [00:00<00:01, 16.44steps/s]

tokenizer_config.json: 5.40kB [00:00, 25.3MB/s]

vocab.json: 0.00B [00:00, ?B/s]

Qwen3-Embedding-0.6B/model.safetensors: 0% 0.00/1.19G [00:00<?, ?B/s]

README.md: 5.50kB [00:00, 5.78MB/s]
vocab.json: 2.78MB [00:00, 65.1MB/s]

added_tokens.json: 0.00B [00:00, ?B/s]

added_tokens.json: 2.22MB [00:00, 101MB/s]
chat_template.jinja: 4.17kB [00:00, 8.11MB/s]

config.json: 1.39kB [00:00, 4.93MB/s]

merges.txt: 1.67MB [00:00, 49.8MB/s]

special_tokens_map.json: 1.82MB [00:00, 413MB/s]

vocab.json: 0.00B [00:00, ?B/s]

vocab.json: 2.78MB [00:00, 73.9MB/s]

config.json: 1.97kB [00:00, 10.8MB/s]

acestep-5Hz-lm-1.7B/tokenizer.json: 0% 0.00/24.3M [00:00<?, ?B/s]

acestep-5Hz-lm-1.7B/tokenizer_config.jso(…): 0% 0.00/14.1M [00:00<?, ?B/s]

configuration_acestep_v15.py: 13.1kB [00:00, 46.1MB/s]

modeling_acestep_v15_turbo.py: 96.0kB [00:00, 198MB/s]

acestep-v15-turbo/model.safetensors: 0% 0.00/4.79G [00:00<?, ?B/s]

acestep-v15-turbo/silence_latent.pt: 0% 0.00/3.84M [00:00<?, ?B/s]

config.json: 1.97kB [00:00, 10.6MB/s]

config.json: 100% 425/425 [00:00<00:00, 2.93MB/s]

acestep-5Hz-lm-1.7B/tokenizer_config.jso(…): 100% 14.1M/14.1M [00:00<00:00, 30.2MB/s]

Qwen3-Embedding-0.6B/tokenizer.json: 100% 11.4M/11.4M [00:00<00:00, 14.7MB/s]

vae/diffusion_pytorch_model.safetensors: 0% 0.00/337M [00:00<?, ?B/s]

acestep-5Hz-lm-1.7B/tokenizer.json: 100% 24.3M/24.3M [00:00<00:00, 48.0MB/s]

acestep-v15-turbo/silence_latent.pt: 100% 3.84M/3.84M [00:00<00:00, 10.4MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 0% 4.35M/1.19G [00:00<04:25, 4.47MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 0% 4.81M/1.19G [00:01<05:11, 3.81MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 1% 6.76M/1.19G [00:01<03:40, 5.36MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 2% 25.5M/1.19G [00:01<00:36, 31.8MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 3% 34.1M/1.19G [00:01<00:30, 38.4MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 4% 50.2M/1.19G [00:01<00:27, 42.2MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 7% 78.4M/1.19G [00:02<00:18, 59.3MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 8% 101M/1.19G [00:02<00:13, 78.7MB/s]
vae/diffusion_pytorch_model.safetensors: 1% 2.09M/337M [00:01<04:57, 1.13MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 10% 114M/1.19G [00:02<00:14, 76.1MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 11% 128M/1.19G [00:02<00:14, 73.8MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 0% 850k/3.71G [00:02<3:21:28, 307kB/s]

Qwen3-Embedding-0.6B/model.safetensors: 12% 142M/1.19G [00:02<00:13, 77.4MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 13% 156M/1.19G [00:03<00:12, 82.8MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 14% 166M/1.19G [00:03<00:13, 77.5MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 15% 178M/1.19G [00:03<00:12, 78.2MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 17% 197M/1.19G [00:03<00:10, 94.8MB/s]

acestep-v15-turbo/model.safetensors: 0% 799k/4.79G [00:03<6:23:32, 208kB/s]

Qwen3-Embedding-0.6B/model.safetensors: 18% 209M/1.19G [00:04<00:32, 30.5MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 18% 218M/1.19G [00:05<00:42, 22.9MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 21% 244M/1.19G [00:05<00:27, 34.4MB/s]
vae/diffusion_pytorch_model.safetensors: 21% 69.2M/337M [00:09<00:36, 7.39MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 23% 275M/1.19G [00:10<01:16, 11.9MB/s]
vae/diffusion_pytorch_model.safetensors: 40% 136M/337M [00:10<00:12, 16.7MB/s]
vae/diffusion_pytorch_model.safetensors: 60% 203M/337M [00:10<00:04, 29.3MB/s]
vae/diffusion_pytorch_model.safetensors: 80% 270M/337M [00:10<00:01, 44.2MB/s]

acestep-v15-turbo/model.safetensors: 1% 67.9M/4.79G [00:13<14:26, 5.45MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 24% 284M/1.19G [00:16<02:45, 5.48MB/s]

acestep-v15-turbo/model.safetensors: 3% 135M/4.79G [00:16<08:12, 9.45MB/s]
vae/diffusion_pytorch_model.safetensors: 100% 337M/337M [00:17<00:00, 19.7MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 24% 289M/1.19G [00:17<02:46, 5.42MB/s]

acestep-v15-turbo/model.safetensors: 3% 137M/4.79G [00:17<09:00, 8.61MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 29% 350M/1.19G [00:18<00:58, 14.3MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 2% 67.9M/3.71G [00:18<16:07, 3.76MB/s]

Fetching 28 files: 14% 4/28 [00:20<00:01, 16.44steps/s]

acestep-5Hz-lm-1.7B/model.safetensors: 4% 135M/3.71G [00:19<06:59, 8.51MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 30% 360M/1.19G [00:20<01:21, 10.3MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 5% 202M/3.71G [00:20<04:11, 13.9MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 36% 429M/1.19G [00:21<00:28, 26.5MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 9% 336M/3.71G [00:23<02:21, 23.8MB/s]

acestep-v15-turbo/model.safetensors: 6% 271M/4.79G [00:25<05:26, 13.9MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 14% 537M/3.71G [00:26<01:25, 37.0MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 42% 496M/1.19G [00:26<00:40, 17.3MB/s]

acestep-v15-turbo/model.safetensors: 7% 338M/4.79G [00:26<03:59, 18.6MB/s]

acestep-v15-turbo/model.safetensors: 8% 405M/4.79G [00:27<03:03, 23.9MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 16% 604M/3.71G [00:27<01:19, 39.0MB/s]

acestep-v15-turbo/model.safetensors: 11% 539M/4.79G [00:27<01:39, 42.6MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 18% 671M/3.71G [00:28<01:09, 43.7MB/s]

acestep-v15-turbo/model.safetensors: 14% 673M/4.79G [00:28<01:04, 63.5MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 20% 740M/3.71G [00:29<00:55, 53.5MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 42% 502M/1.19G [00:29<00:54, 12.5MB/s]

acestep-v15-turbo/model.safetensors: 15% 740M/4.79G [00:29<00:59, 67.6MB/s]

acestep-v15-turbo/model.safetensors: 18% 874M/4.79G [00:29<00:40, 96.8MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 22% 807M/3.71G [00:30<00:51, 56.1MB/s]

acestep-v15-turbo/model.safetensors: 21% 1.01G/4.79G [00:30<00:31, 120MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 24% 874M/3.71G [00:31<00:48, 58.1MB/s]

acestep-v15-turbo/model.safetensors: 24% 1.14G/4.79G [00:30<00:25, 143MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 43% 511M/1.19G [00:31<01:08, 9.98MB/s]

acestep-v15-turbo/model.safetensors: 25% 1.21G/4.79G [00:33<00:44, 79.6MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 43% 515M/1.19G [00:33<01:26, 7.79MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 25% 941M/3.71G [00:33<01:03, 43.7MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 44% 518M/1.19G [00:34<01:22, 8.14MB/s]

acestep-v15-turbo/model.safetensors: 27% 1.28G/4.79G [00:34<00:46, 75.1MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 45% 541M/1.19G [00:35<00:58, 11.1MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 27% 1.01G/3.71G [00:35<01:02, 43.5MB/s]

acestep-v15-turbo/model.safetensors: 28% 1.34G/4.79G [00:35<00:45, 75.7MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 46% 548M/1.19G [00:35<01:00, 10.6MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 46% 552M/1.19G [00:36<01:00, 10.6MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 33% 1.21G/3.71G [00:36<00:33, 75.3MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 48% 573M/1.19G [00:36<00:36, 17.1MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 49% 582M/1.19G [00:37<00:49, 12.3MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 36% 1.34G/3.71G [00:37<00:30, 77.8MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 50% 594M/1.19G [00:38<00:43, 13.8MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 38% 1.41G/3.71G [00:38<00:30, 75.9MB/s]

acestep-v15-turbo/model.safetensors: 29% 1.41G/4.79G [00:38<01:17, 43.5MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 50% 598M/1.19G [00:41<01:39, 5.97MB/s]

acestep-v15-turbo/model.safetensors: 31% 1.48G/4.79G [00:41<01:34, 35.2MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 50% 600M/1.19G [00:42<01:46, 5.53MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 40% 1.48G/3.71G [00:42<00:50, 44.1MB/s]

acestep-v15-turbo/model.safetensors: 34% 1.61G/4.79G [00:42<00:59, 53.2MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 51% 602M/1.19G [00:44<02:23, 4.10MB/s]

acestep-v15-turbo/model.safetensors: 35% 1.68G/4.79G [00:45<01:15, 41.0MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 51% 606M/1.19G [00:48<04:00, 2.44MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 42% 1.54G/3.71G [00:47<01:18, 27.7MB/s]

acestep-v15-turbo/model.safetensors: 38% 1.81G/4.79G [00:48<01:07, 44.2MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 43% 1.61G/3.71G [00:52<01:29, 23.6MB/s]

acestep-v15-turbo/model.safetensors: 39% 1.88G/4.79G [00:51<01:29, 32.6MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 51% 610M/1.19G [00:52<05:46, 1.68MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 47% 1.75G/3.71G [00:52<00:53, 36.7MB/s]

acestep-v15-turbo/model.safetensors: 41% 1.95G/4.79G [00:52<01:15, 37.4MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 52% 616M/1.19G [00:56<05:34, 1.72MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 49% 1.81G/3.71G [00:56<01:01, 31.1MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 52% 620M/1.19G [00:56<04:32, 2.10MB/s]

acestep-v15-turbo/model.safetensors: 42% 2.01G/4.79G [00:59<02:10, 21.3MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 53% 1.95G/3.71G [01:00<00:55, 31.7MB/s]

acestep-v15-turbo/model.safetensors: 43% 2.08G/4.79G [01:00<01:37, 27.9MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 52% 624M/1.19G [01:01<05:56, 1.59MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 54% 2.01G/3.71G [01:00<00:45, 37.5MB/s]

acestep-v15-turbo/model.safetensors: 46% 2.21G/4.79G [01:00<00:56, 45.4MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 53% 626M/1.19G [01:01<05:22, 1.75MB/s]

acestep-v15-turbo/model.safetensors: 48% 2.28G/4.79G [01:01<00:48, 51.8MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 56% 2.08G/3.71G [01:06<01:05, 24.9MB/s]

acestep-v15-turbo/model.safetensors: 52% 2.48G/4.79G [01:06<00:49, 46.9MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 53% 630M/1.19G [01:06<07:25, 1.26MB/s]

acestep-v15-turbo/model.safetensors: 53% 2.55G/4.79G [01:06<00:41, 53.8MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 59% 697M/1.19G [01:07<00:50, 9.89MB/s]

acestep-v15-turbo/model.safetensors: 55% 2.62G/4.79G [01:06<00:32, 66.2MB/s]

acestep-v15-turbo/model.safetensors: 56% 2.68G/4.79G [01:07<00:27, 77.0MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 58% 2.15G/3.71G [01:07<00:54, 28.8MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 59% 705M/1.19G [01:07<00:47, 10.1MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 60% 2.21G/3.71G [01:07<00:39, 37.7MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 60% 710M/1.19G [01:08<00:45, 10.5MB/s]

acestep-v15-turbo/model.safetensors: 57% 2.75G/4.79G [01:07<00:24, 82.6MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 62% 2.28G/3.71G [01:08<00:30, 46.6MB/s]

acestep-v15-turbo/model.safetensors: 59% 2.82G/4.79G [01:08<00:23, 84.1MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 63% 2.35G/3.71G [01:08<00:23, 57.0MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 60% 715M/1.19G [01:09<00:55, 8.62MB/s]

acestep-v15-turbo/model.safetensors: 60% 2.89G/4.79G [01:09<00:20, 93.6MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 65% 2.42G/3.71G [01:10<00:22, 57.6MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 61% 727M/1.19G [01:10<00:50, 9.14MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 66% 2.44G/3.71G [01:10<00:21, 59.6MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 62% 735M/1.19G [01:10<00:43, 10.5MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 62% 739M/1.19G [01:11<00:40, 11.1MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 67% 2.50G/3.71G [01:11<00:17, 68.1MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 69% 2.57G/3.71G [01:11<00:12, 89.8MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 63% 751M/1.19G [01:11<00:31, 14.2MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 71% 2.64G/3.71G [01:11<00:11, 95.6MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 73% 2.70G/3.71G [01:16<00:29, 34.3MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 63% 754M/1.19G [01:16<01:57, 3.72MB/s]

acestep-v15-turbo/model.safetensors: 62% 2.95G/4.79G [01:16<01:09, 26.3MB/s]

acestep-v15-turbo/model.safetensors: 62% 2.98G/4.79G [01:16<01:00, 29.9MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 75% 2.77G/3.71G [01:17<00:20, 44.9MB/s]

acestep-v15-turbo/model.safetensors: 64% 3.05G/4.79G [01:17<00:43, 40.4MB/s]

acestep-v15-turbo/model.safetensors: 65% 3.11G/4.79G [01:17<00:30, 55.7MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 64% 760M/1.19G [01:17<01:42, 4.20MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 64% 764M/1.19G [01:17<01:24, 5.05MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 65% 771M/1.19G [01:18<01:03, 6.63MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 65% 774M/1.19G [01:18<00:54, 7.63MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 77% 2.84G/3.71G [01:18<00:18, 46.6MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 65% 778M/1.19G [01:18<00:49, 8.37MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 66% 781M/1.19G [01:18<00:46, 8.79MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 66% 787M/1.19G [01:19<00:31, 12.8MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 66% 791M/1.19G [01:19<00:26, 14.9MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 67% 795M/1.19G [01:19<00:26, 14.8MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 67% 801M/1.19G [01:22<01:23, 4.69MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 68% 805M/1.19G [01:22<01:15, 5.14MB/s]

acestep-v15-turbo/model.safetensors: 66% 3.18G/4.79G [01:22<00:58, 27.3MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 78% 2.91G/3.71G [01:23<00:28, 27.9MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 80% 2.97G/3.71G [01:23<00:19, 38.5MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 68% 811M/1.19G [01:23<01:00, 6.32MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 69% 824M/1.19G [01:23<00:31, 11.6MB/s]

acestep-v15-turbo/model.safetensors: 68% 3.25G/4.79G [01:23<00:45, 34.1MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 69% 828M/1.19G [01:23<00:30, 12.1MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 82% 3.04G/3.71G [01:23<00:14, 47.0MB/s]

acestep-v15-turbo/model.safetensors: 69% 3.32G/4.79G [01:24<00:33, 43.3MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 70% 835M/1.19G [01:24<00:26, 13.2MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 71% 844M/1.19G [01:24<00:19, 18.3MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 71% 850M/1.19G [01:24<00:14, 23.1MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 73% 865M/1.19G [01:24<00:09, 36.3MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 73% 871M/1.19G [01:25<00:09, 33.7MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 74% 877M/1.19G [01:25<00:08, 36.2MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 84% 3.11G/3.71G [01:25<00:14, 42.9MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 74% 886M/1.19G [01:26<00:20, 14.7MB/s]

acestep-v15-turbo/model.safetensors: 71% 3.38G/4.79G [01:26<00:35, 39.2MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 75% 895M/1.19G [01:26<00:14, 20.5MB/s]

acestep-v15-turbo/model.safetensors: 72% 3.45G/4.79G [01:26<00:25, 52.7MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 76% 903M/1.19G [01:26<00:13, 21.0MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 86% 3.17G/3.71G [01:26<00:11, 47.6MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 77% 915M/1.19G [01:27<00:09, 29.3MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 77% 921M/1.19G [01:27<00:08, 32.0MB/s]

acestep-v15-turbo/model.safetensors: 73% 3.52G/4.79G [01:26<00:20, 62.6MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 87% 3.24G/3.71G [01:27<00:08, 57.9MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 78% 929M/1.19G [01:27<00:10, 23.9MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 79% 937M/1.19G [01:27<00:08, 29.3MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 89% 3.31G/3.71G [01:27<00:05, 73.7MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 79% 945M/1.19G [01:27<00:07, 33.9MB/s]

acestep-v15-turbo/model.safetensors: 75% 3.58G/4.79G [01:28<00:19, 61.1MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 81% 961M/1.19G [01:28<00:07, 32.3MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 91% 3.37G/3.71G [01:28<00:04, 82.4MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 82% 971M/1.19G [01:28<00:05, 36.9MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 83% 984M/1.19G [01:28<00:04, 42.0MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 83% 995M/1.19G [01:29<00:03, 50.3MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 84% 1.00G/1.19G [01:29<00:03, 47.7MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 88% 1.05G/1.19G [01:29<00:01, 108MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 89% 1.07G/1.19G [01:29<00:01, 93.0MB/s]

acestep-v15-turbo/model.safetensors: 76% 3.65G/4.79G [01:29<00:19, 58.5MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 90% 1.08G/1.19G [01:31<00:03, 29.5MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 92% 1.09G/1.19G [01:31<00:02, 37.6MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 93% 1.10G/1.19G [01:31<00:01, 44.6MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 93% 1.11G/1.19G [01:31<00:01, 49.7MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 94% 1.12G/1.19G [01:31<00:01, 53.3MB/s]

acestep-v15-turbo/model.safetensors: 78% 3.72G/4.79G [01:31<00:23, 45.6MB/s]

acestep-v15-turbo/model.safetensors: 79% 3.78G/4.79G [01:34<00:29, 33.5MB/s]

acestep-v15-turbo/model.safetensors: 80% 3.85G/4.79G [01:35<00:20, 46.1MB/s]

acestep-v15-turbo/model.safetensors: 82% 3.92G/4.79G [01:35<00:14, 60.3MB/s]

acestep-v15-turbo/model.safetensors: 83% 3.99G/4.79G [01:35<00:09, 80.7MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 93% 3.44G/3.71G [01:35<00:11, 23.5MB/s]

Qwen3-Embedding-0.6B/model.safetensors: 100% 1.19G/1.19G [01:36<00:00, 12.3MB/s]
Fetching 28 files: 21% 6/28 [01:36<07:43, 21.07s/steps]

acestep-5Hz-lm-1.7B/model.safetensors: 95% 3.51G/3.71G [01:36<00:06, 31.6MB/s]

acestep-v15-turbo/model.safetensors: 85% 4.05G/4.79G [01:36<00:10, 72.8MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 96% 3.57G/3.71G [01:37<00:03, 39.3MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 98% 3.64G/3.71G [01:37<00:01, 54.3MB/s]

acestep-v15-turbo/model.safetensors: 86% 4.12G/4.79G [01:37<00:08, 77.3MB/s]

acestep-v15-turbo/model.safetensors: 87% 4.19G/4.79G [01:37<00:05, 102MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 100% 3.71G/3.71G [01:38<00:00, 60.1MB/s]

acestep-5Hz-lm-1.7B/model.safetensors: 100% 3.71G/3.71G [01:38<00:00, 37.8MB/s]
Fetching 28 files: 57% 16/28 [01:38<01:03, 5.27s/steps]

acestep-v15-turbo/model.safetensors: 90% 4.32G/4.79G [01:38<00:03, 133MB/s]

acestep-v15-turbo/model.safetensors: 92% 4.39G/4.79G [01:38<00:02, 153MB/s]

acestep-v15-turbo/model.safetensors: 93% 4.45G/4.79G [01:38<00:01, 176MB/s]

acestep-v15-turbo/model.safetensors: 94% 4.52G/4.79G [01:40<00:03, 74.0MB/s]

acestep-v15-turbo/model.safetensors: 97% 4.66G/4.79G [01:41<00:01, 120MB/s]

acestep-v15-turbo/model.safetensors: 99% 4.72G/4.79G [01:41<00:00, 145MB/s]

acestep-v15-turbo/model.safetensors: 100% 4.79G/4.79G [01:41<00:00, 47.1MB/s]
Fetching 28 files: 100% 28/28 [01:42<00:00, 3.65s/steps]
2026-02-23 17:11:28.205 | INFO | acestep.handler:_ensure_model_downloaded:195 - Repository ACE-Step/Ace-Step1.5 downloaded successfully to /content/Ace-Step-v1.5/data/checkpoints
2026-02-23 17:11:28.205 | INFO | acestep.handler:initialize_service:463 - [initialize_service] Attempting to load model with attention implementation: sdpa
2026-02-23 17:11:30.030 | INFO | acestep.handler:initialize_service:486 - [initialize_service] Keeping main model on cuda (persistent)
torch_dtype is deprecated! Use dtype instead!
Fetching 7 files: 0% 0/7 [00:00<?, ?steps/s]
_ops.py: 0% 0.00/218 [00:00<?, ?B/s]

_ops.py: 100% 218/218 [00:00<00:00, 37.8kB/s]

init.py: 1.03kB [00:00, 131kB/s]
flash_attn_config.py: 100% 952/952 [00:00<00:00, 1.19MB/s]

init.py: 100% 387/387 [00:00<00:00, 3.27MB/s]
Fetching 7 files: 14% 1/7 [00:00<00:00, 8.38steps/s]
flash_attn_interface.py: 41.3kB [00:00, 91.4MB/s]

metadata.json: 100% 42.0/42.0 [00:00<00:00, 272kB/s]

build/torch29-cxx11-cu128-x86_64-linux/(…): 0% 0.00/804M [00:00<?, ?B/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 0% 131k/804M [00:00<50:25, 266kB/s]
build/torch29-cxx11-cu128-x86_64-linux/(…): 2% 17.7M/804M [00:00<00:20, 39.3MB/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 6% 44.5M/804M [00:00<00:08, 94.6MB/s]
build/torch29-cxx11-cu128-x86_64-linux/(…): 9% 69.8M/804M [00:00<00:05, 134MB/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 13% 103M/804M [00:00<00:03, 185MB/s]
build/torch29-cxx11-cu128-x86_64-linux/(…): 16% 129M/804M [00:01<00:04, 164MB/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 19% 156M/804M [00:01<00:03, 186MB/s]
build/torch29-cxx11-cu128-x86_64-linux/(…): 24% 189M/804M [00:01<00:02, 218MB/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 27% 215M/804M [00:01<00:02, 204MB/s]
build/torch29-cxx11-cu128-x86_64-linux/(…): 30% 238M/804M [00:01<00:02, 201MB/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 34% 272M/804M [00:01<00:02, 233MB/s]
build/torch29-cxx11-cu128-x86_64-linux/(…): 37% 299M/804M [00:01<00:02, 228MB/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 40% 323M/804M [00:01<00:02, 230MB/s]
build/torch29-cxx11-cu128-x86_64-linux/(…): 44% 354M/804M [00:02<00:01, 249MB/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 48% 383M/804M [00:02<00:01, 258MB/s]
build/torch29-cxx11-cu128-x86_64-linux/(…): 52% 415M/804M [00:02<00:01, 274MB/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 55% 444M/804M [00:02<00:01, 243MB/s]
build/torch29-cxx11-cu128-x86_64-linux/(…): 58% 469M/804M [00:02<00:01, 239MB/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 62% 495M/804M [00:02<00:01, 220MB/s]
build/torch29-cxx11-cu128-x86_64-linux/(…): 65% 524M/804M [00:02<00:01, 231MB/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 69% 551M/804M [00:02<00:01, 239MB/s]
build/torch29-cxx11-cu128-x86_64-linux/(…): 72% 577M/804M [00:02<00:00, 240MB/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 75% 604M/804M [00:03<00:00, 240MB/s]
build/torch29-cxx11-cu128-x86_64-linux/(…): 79% 636M/804M [00:03<00:00, 260MB/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 83% 667M/804M [00:03<00:00, 256MB/s]
build/torch29-cxx11-cu128-x86_64-linux/(…): 92% 740M/804M [00:03<00:00, 262MB/s]
build/torch29-cxx11-cu128-x86_64-linux/
(…): 100% 804M/804M [00:03<00:00, 218MB/s]
Fetching 7 files: 100% 7/7 [00:03<00:00, 1.80steps/s]
2026-02-23 17:11:51.748 | INFO | acestep.handler:initialize_service:574 - [initialize_service] Text encoder loaded with kernels-community/flash-attn3
DiT model 1 initialized successfully
Initializing 5Hz LM: acestep-5Hz-lm-1.7B...
2026-02-23 17:11:51.755 | INFO | acestep.llm_inference:initialize:374 - [LLM Init Debug] IS_ZEROGPU=False, IS_HUGGINGFACE_SPACE=False
2026-02-23 17:11:51.755 | INFO | acestep.llm_inference:initialize:375 - [LLM Init Debug] torch.cuda.is_available()=True
2026-02-23 17:11:51.755 | INFO | acestep.llm_inference:initialize:376 - [LLM Init Debug] device=cuda, offload_to_cpu=True
2026-02-23 17:11:51.755 | INFO | acestep.llm_inference:initialize:392 - loading 5Hz LM tokenizer... it may take 80~90s
2026-02-23 17:12:22.264 | INFO | acestep.llm_inference:initialize:396 - 5Hz LM tokenizer loaded successfully in 30.51 seconds
2026-02-23 17:12:22.264 | INFO | acestep.llm_inference:initialize:400 - Initializing constrained decoding processor...
2026-02-23 17:12:23.755 | WARNING | acestep.constrained_logits_processor:_precompute_audio_code_tokens:545 - Found 1535 audio code tokens with values outside valid range [0, 63999]
2026-02-23 17:12:28.153 | INFO | acestep.llm_inference:initialize:407 - Constrained processor initialized in 5.89 seconds
2026-02-23 17:12:30.063 | ERROR | acestep.llm_inference:_initialize_5hz_lm_vllm:445 - nano-vllm is not installed. Please install it using 'cd acestep/third_parts/nano-vllm && pip install .
2026-02-23 17:12:30.063 | INFO | acestep.llm_inference:initialize:413 - 5Hz LM status message: ❌ nano-vllm is not installed. Please install it using 'cd acestep/third_parts/nano-vllm && pip install .
2026-02-23 17:12:30.063 | WARNING | acestep.llm_inference:initialize:418 - vllm initialization failed, falling back to PyTorch backend
2026-02-23 17:12:30.063 | INFO | acestep.llm_inference:_load_pytorch_model:237 - [LLM Load] Attempting to load model with attention implementation: kernels-community/flash-attn3
Fetching 7 files: 100% 7/7 [00:00<00:00, 19040.29steps/s]
Fetching 7 files: 100% 7/7 [00:00<00:00, 10145.17steps/s]
2026-02-23 17:12:30.935 | INFO | acestep.llm_inference:_load_pytorch_model:251 - [LLM Load Debug] Model loaded with kernels-community/flash-attn3, initial device: cpu
2026-02-23 17:12:30.946 | INFO | acestep.llm_inference:_load_pytorch_model:256 - [LLM Load Debug] After .to(), model device: cpu
2026-02-23 17:12:30.951 | INFO | acestep.llm_inference:_load_pytorch_model:262 - 5Hz LM initialized successfully using PyTorch backend on cuda
5Hz LM initialized successfully
Service initialization completed!
Creating Gradio interface...
Enabling queue for multi-user support...
Launching server on 0.0.0.0:7860...

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run gradio deploy from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
2026-02-23 17:16:32.395 | INFO | acestep.inference:create_sample:914 - [create_sample Debug] Entry: IS_HUGGINGFACE_SPACE=False
2026-02-23 17:16:32.395 | INFO | acestep.inference:create_sample:915 - [create_sample Debug] torch.cuda.is_available()=True
2026-02-23 17:16:32.396 | INFO | acestep.inference:create_sample:917 - [create_sample Debug] torch.cuda.current_device()=0
2026-02-23 17:16:32.396 | INFO | acestep.inference:create_sample:918 - [create_sample Debug] llm_handler.device=cuda, llm_handler.offload_to_cpu=True
2026-02-23 17:16:32.396 | INFO | acestep.inference:create_sample:921 - [create_sample Debug] Model device: cpu
2026-02-23 17:16:32.396 | INFO | acestep.llm_inference:create_sample_from_query:1648 - Creating sample from query: una canzone italiana italo disco su fleximan, il supereroe che taglia i pali con la sega circolare p... (instrumental=[], vocal_language=it)
2026-02-23 17:16:32.433 | DEBUG | acestep.llm_inference:create_sample_from_query:1655 - Formatted prompt for inspiration: <|im_start|>system

Instruction

Expand the user's input into a more detailed and specific musical description:

<|im_end|>
<|im_start|>user
una canzone italiana italo disco su fleximan, il supereroe che taglia i pali con la sega circolare per farci circolare

instrumental: false<|im_end|>
<|im_start|>assistant

2026-02-23 17:16:32.433 | INFO | acestep.llm_inference:create_sample_from_query:1664 - Using user-specified language: it
2026-02-23 17:16:32.435 | DEBUG | acestep.constrained_logits_processor:set_target_duration:1269 - Target duration cleared, no duration constraint
2026-02-23 17:16:32.435 | DEBUG | acestep.constrained_logits_processor:set_user_metadata:425 - User provided metadata fields: ['language']
2026-02-23 17:16:32.435 | INFO | acestep.llm_inference:_load_model_context:2379 - [_load_model_context Debug] Entry: offload_to_cpu=True, backend=pt, self.device=cuda
2026-02-23 17:16:32.435 | INFO | acestep.llm_inference:_load_model_context:2380 - [_load_model_context Debug] torch.cuda.is_available()=True, IS_ZEROGPU=False
2026-02-23 17:16:32.435 | INFO | acestep.llm_inference:_load_model_context:2385 - [_load_model_context Debug] Model current device: cpu
2026-02-23 17:16:32.435 | INFO | acestep.llm_inference:_load_model_context:2397 - [_load_model_context Debug] Moving model from CPU to cuda
2026-02-23 17:16:46.959 | INFO | acestep.llm_inference:_load_model_context:2399 - [_load_model_context Debug] Model now on: cuda:0
2026-02-23 17:16:46.960 | INFO | acestep.llm_inference:_load_model_context:2417 - Loading LLM to cuda
2026-02-23 17:16:46.966 | INFO | acestep.llm_inference:_load_model_context:2422 - Loaded LLM to cuda in 0.0064s
2026-02-23 17:16:46.966 | INFO | acestep.llm_inference:_run_pt_single:645 - [_run_pt_single Debug] Inputs moved to model device: cuda:0
2026-02-23 17:16:46.966 | INFO | acestep.llm_inference:_run_pt_single:646 - [_run_pt_single Debug] Input actual device: cuda:0
2026-02-23 17:16:47.626 | INFO | acestep.llm_inference:_load_model_context:2428 - Offloading LLM to CPU
2026-02-23 17:16:50.540 | INFO | acestep.llm_inference:_load_model_context:2434 - Offloaded LLM to CPU in 2.9136s
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/gradio/queueing.py", line 766, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/route_utils.py", line 355, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/blocks.py", line 2147, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/blocks.py", line 1641, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 859, in async_iteration
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 850, in anext
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/anyio/to_thread.py", line 63, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/anyio/_backends/_asyncio.py", line 2502, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/anyio/_backends/_asyncio.py", line 986, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 833, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 1017, in gen_wrapper
response = next(iterator)
^^^^^^^^^^^^^^
File "/content/Ace-Step-v1.5/acestep/gradio_ui/events/init.py", line 716, in generation_wrapper
raise gr.Error(f"Failed to create sample: {result.status_message}")
gradio.exceptions.Error: 'Failed to create sample: ❌ Error generating from formatted prompt: FlashAttention only supports Ampere GPUs or newer.'
2026-02-23 17:17:19.833 | INFO | acestep.inference:create_sample:914 - [create_sample Debug] Entry: IS_HUGGINGFACE_SPACE=False
2026-02-23 17:17:19.833 | INFO | acestep.inference:create_sample:915 - [create_sample Debug] torch.cuda.is_available()=True
2026-02-23 17:17:19.834 | INFO | acestep.inference:create_sample:917 - [create_sample Debug] torch.cuda.current_device()=0
2026-02-23 17:17:19.834 | INFO | acestep.inference:create_sample:918 - [create_sample Debug] llm_handler.device=cuda, llm_handler.offload_to_cpu=True
2026-02-23 17:17:19.834 | INFO | acestep.inference:create_sample:921 - [create_sample Debug] Model device: cpu
2026-02-23 17:17:19.834 | INFO | acestep.llm_inference:create_sample_from_query:1648 - Creating sample from query: una canzone italiana italo disco su fleximan, il supereroe che taglia i pali con la sega circolare p... (instrumental=[], vocal_language=it)
2026-02-23 17:17:19.838 | DEBUG | acestep.llm_inference:create_sample_from_query:1655 - Formatted prompt for inspiration: <|im_start|>system

Instruction

Expand the user's input into a more detailed and specific musical description:

<|im_end|>
<|im_start|>user
una canzone italiana italo disco su fleximan, il supereroe che taglia i pali con la sega circolare per farci circolare

instrumental: false<|im_end|>
<|im_start|>assistant

2026-02-23 17:17:19.839 | INFO | acestep.llm_inference:create_sample_from_query:1664 - Using user-specified language: it
2026-02-23 17:17:19.844 | DEBUG | acestep.constrained_logits_processor:set_target_duration:1269 - Target duration cleared, no duration constraint
2026-02-23 17:17:19.845 | DEBUG | acestep.constrained_logits_processor:set_user_metadata:425 - User provided metadata fields: ['language']
2026-02-23 17:17:19.845 | INFO | acestep.llm_inference:_load_model_context:2379 - [_load_model_context Debug] Entry: offload_to_cpu=True, backend=pt, self.device=cuda
2026-02-23 17:17:19.845 | INFO | acestep.llm_inference:_load_model_context:2380 - [_load_model_context Debug] torch.cuda.is_available()=True, IS_ZEROGPU=False
2026-02-23 17:17:19.845 | INFO | acestep.llm_inference:_load_model_context:2385 - [_load_model_context Debug] Model current device: cpu
2026-02-23 17:17:19.845 | INFO | acestep.llm_inference:_load_model_context:2397 - [_load_model_context Debug] Moving model from CPU to cuda
2026-02-23 17:17:21.157 | INFO | acestep.llm_inference:_load_model_context:2399 - [_load_model_context Debug] Model now on: cuda:0
2026-02-23 17:17:21.158 | INFO | acestep.llm_inference:_load_model_context:2417 - Loading LLM to cuda
2026-02-23 17:17:21.170 | INFO | acestep.llm_inference:_load_model_context:2422 - Loaded LLM to cuda in 0.0125s
2026-02-23 17:17:21.171 | INFO | acestep.llm_inference:_run_pt_single:645 - [_run_pt_single Debug] Inputs moved to model device: cuda:0
2026-02-23 17:17:21.171 | INFO | acestep.llm_inference:_run_pt_single:646 - [_run_pt_single Debug] Input actual device: cuda:0
2026-02-23 17:17:21.177 | INFO | acestep.llm_inference:_load_model_context:2428 - Offloading LLM to CPU
2026-02-23 17:17:22.901 | INFO | acestep.llm_inference:_load_model_context:2434 - Offloaded LLM to CPU in 1.7220s
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/gradio/queueing.py", line 766, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/route_utils.py", line 355, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/blocks.py", line 2147, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/blocks.py", line 1641, in call_function
prediction = await utils.async_iteration(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 859, in async_iteration
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 850, in anext
return await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/anyio/to_thread.py", line 63, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/anyio/_backends/_asyncio.py", line 2502, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/anyio/_backends/_asyncio.py", line 986, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 833, in run_sync_iterator_async
return next(iterator)
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/gradio/utils.py", line 1017, in gen_wrapper
response = next(iterator)
^^^^^^^^^^^^^^
File "/content/Ace-Step-v1.5/acestep/gradio_ui/events/init.py", line 716, in generation_wrapper
raise gr.Error(f"Failed to create sample: {result.status_message}")
gradio.exceptions.Error: 'Failed to create sample: ❌ Error generating from formatted prompt: FlashAttention only supports Ampere GPUs or newer.'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions