fix: generated caption no longer gets truncated by jdluzen · Pull Request #19 · ServeurpersoCom/acestep.cpp

jdluzen · 2026-03-09T23:30:15Z

I've been attempting to track down an issue where the generated caption is essentially cut off might sentence.
Occasionally, the lyrics also do not generate or are garbled, which I am wondering if it is the same issue.

{
    "caption": "epic rock and roll about ramen noodles",
    "duration": 0,
    "lyrics": "",
    "inference_steps": 8,
    "vocal_language": "en"
}

I fully expect that there is a better solution, though this is what works for me right now for both regular and instrumental.

Summary by CodeRabbit

Bug Fixes
- Improved text generation quality for lyrics and reasoning modes by optimizing sampling behavior.
- Enhanced language constraint application to activate selectively, preventing unnecessary restrictions during appropriate generation scenarios.

coderabbitai · 2026-03-09T23:30:29Z

📝 Walkthrough

Walkthrough

This pull request refines the generation logic in the fill step by conditionally disabling Classifier-Free Guidance (CFG) during text expansion and introducing selective FSM activation based on lyrics needs and CoT reasoning flags, preventing CFG distortion in textual scenarios.

Changes

Cohort / File(s)	Summary
Selective FSM and CFG Control `tools/ace-qwen3.cpp`	Introduces `active_fsm` pointer to conditionally enable FSM for generation: FSM is enabled for lyrics only when a valid vocal_language exists, and for non-lyrics when CoT reasoning is disabled. CFG is disabled (set to 1.0) during text expansion in lyrics generation or CoT reasoning to avoid sampling distortion. Updates `phase1_batch` invocation to use the new `active_fsm` pointer instead of direct FSM boolean indicator.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix: pass generated caption to output json file #14: Complements this PR by propagating generated captions and vocal_language information, while this PR controls when FSM is enabled and manages CFG behavior during text expansion in the same caption/CoT flow.

Suggested reviewers

ServeurpersoCom

Poem

🐰 With careful paws, I toggle FSM's gate,
CFG dims when lyrics must generate,
CoT thoughts flow free from sampling's bend,
Each flag aligned, the right conditions blend! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix: generated caption no longer gets truncated' is specific and directly addresses the main issue resolved by the changeset, which involves modifications to CFG and FSM logic to prevent caption truncation during text expansion.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/ace-qwen3.cpp`:
- Around line 794-797: The current logic only sets fill_cfg = 1.0f when
need_lyrics or req.use_cot_caption are true, but every path that calls
generate_phase1_batch() should have CFG disabled; change the assignment so
fill_cfg is always 1.0f for phase-1 fills (regardless of need_lyrics or
req.use_cot_caption) prior to calling generate_phase1_batch(), leaving
fill_top_p as-is and still using cfg_scale elsewhere; update references to
fill_cfg in the phase-1 generation code (e.g., where generate_phase1_batch() is
invoked) to ensure metadata-only fills also use fill_cfg = 1.0f.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1fda06bd-7129-42d4-8d56-66b53b0dc435

📥 Commits

Reviewing files that changed from the base of the PR and between 1d57065 and 90e0325.

📒 Files selected for processing (1)

tools/ace-qwen3.cpp

coderabbitai · 2026-03-09T23:34:03Z

tools/ace-qwen3.cpp

+        // Disable CFG for ANY textual expansion (lyrics OR CoT reasoning),
+        // as CFG distorts text logits and forces premature newlines.
+        float fill_cfg   = (need_lyrics || req.use_cot_caption) ? 1.0f : cfg_scale;
+        float fill_top_p = top_p;


⚠️ Potential issue | 🟠 Major

Keep CFG off for every phase-1 fill path.

Lines 794-797 only force fill_cfg = 1.0f for lyrics or use_cot_caption, but this branch always calls generate_phase1_batch() to produce text that is later parsed back into AcePrompt. Metadata-only fill therefore still runs textual sampling with CFG, so one fill mode keeps the same truncation/garbling risk this change is trying to remove.

Suggested fix

- // Disable CFG for ANY textual expansion (lyrics OR CoT reasoning), - // as CFG distorts text logits and forces premature newlines. - float fill_cfg = (need_lyrics || req.use_cot_caption) ? 1.0f : cfg_scale; + // Phase 1 always generates text in the fill path, so keep CFG off here. + float fill_cfg = 1.0f;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Disable CFG for ANY textual expansion (lyrics OR CoT reasoning),

// as CFG distorts text logits and forces premature newlines.

float fill_cfg = (need_lyrics || req.use_cot_caption) ? 1.0f : cfg_scale;

float fill_top_p = top_p;

// Phase 1 always generates text in the fill path, so keep CFG off here.

float fill_cfg = 1.0f;

float fill_top_p = top_p;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/ace-qwen3.cpp` around lines 794 - 797, The current logic only sets fill_cfg = 1.0f when need_lyrics or req.use_cot_caption are true, but every path that calls generate_phase1_batch() should have CFG disabled; change the assignment so fill_cfg is always 1.0f for phase-1 fills (regardless of need_lyrics or req.use_cot_caption) prior to calling generate_phase1_batch(), leaving fill_top_p as-is and still using cfg_scale elsewhere; update references to fill_cfg in the phase-1 generation code (e.g., where generate_phase1_batch() is invoked) to ensure metadata-only fills also use fill_cfg = 1.0f.

A 1 sample test with instrumental works, but a non-instrumental test when generating lyrics truncated the caption.

@jdluzen, thanks for testing! That's a very useful data point.

Since the PR already sets fill_cfg = 1.0f when need_lyrics = true, CFG is not the active issue for the lyrics path. The likely culprit is that the old code also reset fill_top_p and fill_top_k for lyrics specifically, and those lines were removed in this PR:

// Old code (before this PR) — these two lines are now gone: fill_top_p = 1.0f; fill_top_k = 0;

Without those overrides, lyrics generation now uses whatever top_p and top_k values come from the request. A restrictive top_p < 1.0 (or a non-zero top_k) shrinks the candidate pool during free-text sampling, which can cause the sampler to pick TOKEN_IM_END early and truncate mid-sentence.

A secondary suspect is the FSM: with vocal_language = "en" (the example payload in your PR), fsm.force_language("en") is active during lyrics fill. If the language-forcing masks too aggressively, it can also block valid continuation tokens.

Suggested fix — restore the sampling overrides for the lyrics path alongside the CFG change:

// Disable CFG for ANY textual expansion (lyrics OR CoT reasoning), // as CFG distorts text logits and forces premature newlines. float fill_cfg = (need_lyrics || req.use_cot_caption) ? 1.0f : cfg_scale; float fill_top_p = top_p; int fill_top_k = top_k; if (need_lyrics) { // Free-text caption expansion: unrestricted nucleus sampling. // CFG is already off (fill_cfg = 1.0f above); also disable top_k // and open top_p so the sampler doesn't prematurely hit EOS. fill_top_p = 1.0f; fill_top_k = 0; }

This restores the old behavior for the lyrics path while keeping the CFG fix for use_cot_caption. If you're still seeing truncation after this, the FSM language-forcing during caption expansion is worth investigating next (it may make sense to skip force_language when the fill step is generating a free-form caption rather than structured metadata).

ServeurpersoCom · 2026-03-10T15:19:59Z

Great work !

                       Python ref          acestep.cpp (before)   acestep.cpp (PR #19)
CFG during CoT            yes (bug)           yes (bug)              no (fixed)
FSM during CoT caption    yes (via processor) yes                    no (disabled)
Default cfg_scale         1.0 (hides bug)     2.0 (exposes bug)      2.0 (bug fixed)
CoT caption tokens        ~180 (cfg=1.0)      47 (cfg=2.0, trunc)    179 (cfg=1.0, full)
Caption quality           full (by luck)      truncated              full (by design)
Audio codes impact        clean               degraded prompt        clean
DiT conditioning          full embedding       partial embedding     full embedding

* Add LEGO mode: generate instrumental stems over references (#19) * Add LEGO mode: --lego <track> flag for dit-vae, example files, README docs * Remove base model check from lego.sh Removed the echo statement for ensuring the base model. * Refactor lego.sh by removing echo statements Removed echo statements for steps in the script. * Implement error check for --lego with DiT model Add error handling for --lego option requiring base DiT model * Move lego mode from `--lego <track>` CLI flag to `"lego"` JSON request field (#21) * apply requested changes --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>

fix: generated caption no longer gets truncated

90e0325

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

ServeurpersoCom merged commit 0a91260 into ServeurpersoCom:master Mar 10, 2026
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: generated caption no longer gets truncated#19

fix: generated caption no longer gets truncated#19
ServeurpersoCom merged 1 commit intoServeurpersoCom:masterfrom
jdluzen:fix/captiontruncate

jdluzen commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 9, 2026 •

edited

Loading

Uh oh!

jdluzen Mar 10, 2026

Uh oh!

coderabbitai bot Mar 10, 2026

Uh oh!

Uh oh!

ServeurpersoCom commented Mar 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jdluzen commented Mar 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdluzen Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ServeurpersoCom commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jdluzen commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

coderabbitai bot Mar 9, 2026 •

edited

Loading

ServeurpersoCom commented Mar 10, 2026 •

edited

Loading