Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 25 additions & 14 deletions tools/ace-qwen3.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -790,29 +790,40 @@ int main(int argc, char ** argv) {
prompt = build_lm_prompt(bpe, ace);
}
std::vector<int> uncond;
float fill_cfg = cfg_scale;
float fill_top_p = top_p;
int fill_top_k = top_k;
if (need_lyrics) {
// lyrics generation: free sampling, no CFG (matches original behavior)
fill_cfg = 1.0f;
fill_top_p = 1.0f;
fill_top_k = 0;
} else if (fill_cfg > 1.0f) {

// Disable CFG for ANY textual expansion (lyrics OR CoT reasoning),
// as CFG distorts text logits and forces premature newlines.
float fill_cfg = (need_lyrics || req.use_cot_caption) ? 1.0f : cfg_scale;
float fill_top_p = top_p;
Comment on lines +794 to +797
Copy link

@coderabbitai coderabbitai bot Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep CFG off for every phase-1 fill path.

Lines 794-797 only force fill_cfg = 1.0f for lyrics or use_cot_caption, but this branch always calls generate_phase1_batch() to produce text that is later parsed back into AcePrompt. Metadata-only fill therefore still runs textual sampling with CFG, so one fill mode keeps the same truncation/garbling risk this change is trying to remove.

Suggested fix
-        // Disable CFG for ANY textual expansion (lyrics OR CoT reasoning),
-        // as CFG distorts text logits and forces premature newlines.
-        float fill_cfg   = (need_lyrics || req.use_cot_caption) ? 1.0f : cfg_scale;
+        // Phase 1 always generates text in the fill path, so keep CFG off here.
+        float fill_cfg   = 1.0f;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Disable CFG for ANY textual expansion (lyrics OR CoT reasoning),
// as CFG distorts text logits and forces premature newlines.
float fill_cfg = (need_lyrics || req.use_cot_caption) ? 1.0f : cfg_scale;
float fill_top_p = top_p;
// Phase 1 always generates text in the fill path, so keep CFG off here.
float fill_cfg = 1.0f;
float fill_top_p = top_p;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/ace-qwen3.cpp` around lines 794 - 797, The current logic only sets
fill_cfg = 1.0f when need_lyrics or req.use_cot_caption are true, but every path
that calls generate_phase1_batch() should have CFG disabled; change the
assignment so fill_cfg is always 1.0f for phase-1 fills (regardless of
need_lyrics or req.use_cot_caption) prior to calling generate_phase1_batch(),
leaving fill_top_p as-is and still using cfg_scale elsewhere; update references
to fill_cfg in the phase-1 generation code (e.g., where generate_phase1_batch()
is invoked) to ensure metadata-only fills also use fill_cfg = 1.0f.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 1 sample test with instrumental works, but a non-instrumental test when generating lyrics truncated the caption.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdluzen, thanks for testing! That's a very useful data point.

Since the PR already sets fill_cfg = 1.0f when need_lyrics = true, CFG is not the active issue for the lyrics path. The likely culprit is that the old code also reset fill_top_p and fill_top_k for lyrics specifically, and those lines were removed in this PR:

// Old code (before this PR) — these two lines are now gone:
fill_top_p = 1.0f;
fill_top_k = 0;

Without those overrides, lyrics generation now uses whatever top_p and top_k values come from the request. A restrictive top_p < 1.0 (or a non-zero top_k) shrinks the candidate pool during free-text sampling, which can cause the sampler to pick TOKEN_IM_END early and truncate mid-sentence.

A secondary suspect is the FSM: with vocal_language = "en" (the example payload in your PR), fsm.force_language("en") is active during lyrics fill. If the language-forcing masks too aggressively, it can also block valid continuation tokens.

Suggested fix — restore the sampling overrides for the lyrics path alongside the CFG change:

// Disable CFG for ANY textual expansion (lyrics OR CoT reasoning),
// as CFG distorts text logits and forces premature newlines.
float fill_cfg   = (need_lyrics || req.use_cot_caption) ? 1.0f : cfg_scale;
float fill_top_p = top_p;
int   fill_top_k = top_k;

if (need_lyrics) {
    // Free-text caption expansion: unrestricted nucleus sampling.
    // CFG is already off (fill_cfg = 1.0f above); also disable top_k
    // and open top_p so the sampler doesn't prematurely hit EOS.
    fill_top_p = 1.0f;
    fill_top_k = 0;
}

This restores the old behavior for the lyrics path while keeping the CFG fix for use_cot_caption. If you're still seeing truncation after this, the FSM language-forcing during caption expansion is worth investigating next (it may make sense to skip force_language when the fill step is generating a free-form caption rather than structured metadata).

int fill_top_k = top_k;

if (fill_cfg > 1.0f) {
uncond = build_lm_prompt_uncond(bpe, ace, neg_prompt);
}

fsm.reset();
if (need_lyrics && use_fsm && ace.vocal_language != "unknown" && !ace.vocal_language.empty()) {
fsm.force_language(bpe, ace.vocal_language);
MetadataFSM * active_fsm = nullptr;

if (use_fsm) {
if (need_lyrics) {
// Free text for lyrics. Only use FSM if strictly forcing language.
if (ace.vocal_language != "unknown" && !ace.vocal_language.empty()) {
fsm.force_language(bpe, ace.vocal_language);
active_fsm = &fsm;
}
} else {
if (!req.use_cot_caption) {
active_fsm = &fsm;
}
}
}

fprintf(stderr, "[Fill] lyrics=%s metas=%s | %zu tokens, CFG: %.2f, N=%d\n", need_lyrics ? "generate" : "keep",
has_all_metas ? "complete" : "fill gaps", prompt.size(), fill_cfg, batch_size);

auto phase1_texts = generate_phase1_batch(&model, &bpe, prompt, 2048, temperature, fill_top_p, fill_top_k, seed,
batch_size, use_fsm ? &fsm : nullptr, need_lyrics, fill_cfg,
uncond.empty() ? nullptr : &uncond, !need_lyrics);
auto phase1_texts =
generate_phase1_batch(&model, &bpe, prompt, 2048, temperature, fill_top_p, fill_top_k, seed, batch_size,
active_fsm, need_lyrics, fill_cfg, uncond.empty() ? nullptr : &uncond, !need_lyrics);

parse_phase1_into_aces(phase1_texts, ace, aces, seed, "Fill", need_lyrics, req.use_cot_caption);

Expand Down
Loading