Skip to content

fix: pass generated caption to output json file#14

Merged
ServeurpersoCom merged 2 commits intoServeurpersoCom:masterfrom
jdluzen:patch-2
Mar 8, 2026
Merged

fix: pass generated caption to output json file#14
ServeurpersoCom merged 2 commits intoServeurpersoCom:masterfrom
jdluzen:patch-2

Conversation

@jdluzen
Copy link
Contributor

@jdluzen jdluzen commented Mar 7, 2026

Original user's input prompt/caption is being passed to the output json file. This saves the generated caption instead.
However, this may not entirely fix my issue that I've been trying to track down, but might push it in the right direction.
Example:

{
    "caption": "epic rock and roll",
    "duration": 300,
    "lyrics": "[Instrumental]",
    "inference_steps": 8,
    "guidance_scale": 7.0
}

Now expands into

{
  "caption": "An explosive, high-energy instrumental piece driven by a powerful brass section",
  "lyrics": "[Instrumental]",
  "bpm": 86,
  "duration": 300.0,
  "keyscale": "G minor",
  "timesignature": "2",
  "vocal_language": "unknown",
  "seed": 8493467414169643341,
  "lm_temperature": 0.85,
  "lm_cfg_scale": 2.0,
  "lm_top_p": 0.90,
  "lm_top_k": 0,
  "lm_negative_prompt": "",
  "inference_steps": 8,
  "guidance_scale": 7.0,
  "shift": 3.0,
  "audio_codes": "[array here]"
}

Summary by CodeRabbit

  • New Features

    • Captions now properly propagate during metadata parsing when available.
  • Improvements

    • Optimized metadata field handling for consistency and improved performance across BPM, duration, key scale, time signature, and vocal language parameters.

Original user's input prompt/caption is being passed to the output json file. This saves the generated caption instead.
@ServeurpersoCom
Copy link
Owner

I'm looking into it. I started by avoiding exceptions with JSON data to keep things simple compared to the reference project in Python. But we need gap-fill, so I'm also looking at the language that has the same problem.

@ServeurpersoCom
Copy link
Owner

We need this !

--- a/src/metadata-fsm.h
+++ b/src/metadata-fsm.h
@@ -401,9 +401,13 @@ static void parse_phase1_into_aces(const std::vector<std::string> & texts,
         if (!parsed.timesignature.empty() && base.timesignature.empty()) {
             aces[i].timesignature = parsed.timesignature;
         }
-        if (!parsed.vocal_language.empty() && base.vocal_language.empty()) {
+        if (!parsed.vocal_language.empty() && (base.vocal_language.empty() || base.vocal_language == "unknown")) {
             aces[i].vocal_language = parsed.vocal_language;
         }
+        // caption: LLM enriches the user caption via CoT (e.g. "rock" -> detailed description)
+        if (!parsed.caption.empty()) {
+            aces[i].caption = parsed.caption;
+        }
         // lyrics: only generated when user had none
         if (merge_lyrics && !parsed.lyrics.empty()) {
             aces[i].lyrics = parsed.lyrics;

@ServeurpersoCom ServeurpersoCom merged commit d6ab814 into ServeurpersoCom:master Mar 8, 2026
2 of 3 checks passed
@coderabbitai
Copy link

coderabbitai bot commented Mar 8, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e9376f7e-1310-4404-91f7-6d683d49eb32

📥 Commits

Reviewing files that changed from the base of the PR and between 05829ec and c8da9fd.

📒 Files selected for processing (1)
  • src/metadata-fsm.h

📝 Walkthrough

Walkthrough

Refactoring Phase 1 merging logic in metadata state machine by consolidating gap-fill assignments for BPM, duration, keyscale, timesignature, and vocal_language from multiple if-blocks into single-line statements. Adds caption propagation when parsed.caption is non-empty, while preserving duration defaults and bounds checks.

Changes

Cohort / File(s) Summary
Phase 1 Merging Refactoring
src/metadata-fsm.h
Collapsed multiple if-block gap-fill assignments into single-line statements for BPM, duration, keyscale, timesignature, and vocal_language. Added caption propagation logic. Maintained existing duration defaulting, bounds validation, and lyrics handling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Hop! Hop! The redundant blocks collapse,
Single lines now fill the gaps,
Captions flow like morning dew,
Cleaner code—a rabbit's view!

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants