Skip to content

Conversation

@EdisonSu768
Copy link
Member

@EdisonSu768 EdisonSu768 commented Feb 9, 2026

Summary by CodeRabbit

  • Documentation
    • Removed guides for creating and managing fine-tuning and pre-training tasks
    • Removed documentation for developing custom task templates
    • Removed installation and setup instructions for fine-tuning features
    • Added comprehensive guide for fine-tuning using Jupyter notebooks with VolcanoJob integration, including MLflow experiment tracking

@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

Walkthrough

Multiple fine-tuning and pre-training documentation pages were deleted from their distributed locations. A new comprehensive notebook-based fine-tuning guide was added to the workbench section, consolidating training workflows into a single resource with VolcanoJob integration and MLflow tracking.

Changes

Cohort / File(s) Summary
Fine-tuning documentation removal
docs/en/fine_tuning/index.mdx, docs/en/fine_tuning/intro.mdx, docs/en/fine_tuning/how_to/index.mdx, docs/en/fine_tuning/how_to/creatings.mdx, docs/en/fine_tuning/how_to/develop_templates.mdx
Deleted entire fine-tuning guide section including intro, overview, creation steps, and custom template development instructions.
Pre-training documentation removal
docs/en/pre-training/index.mdx, docs/en/pre-training/intro.mdx, docs/en/pre-training/how_to/index.mdx, docs/en/pre-training/how_to/creatings.mdx, docs/en/pre-training/how_to/develop_templates.mdx
Deleted entire pre-training guide section including overview, task creation procedures, and custom template guidance.
Installation documentation removal
docs/en/installation/fine-tuning.mdx
Removed fine-tuning setup and installation guide covering plugin deployment, feature enablement, and template configuration.
Workbench notebook-based guide addition
docs/en/workbench/how_to/fine_tunning_using_notebooks.mdx
Added comprehensive guide for notebook-based fine-tuning and training workflows using VolcanoJob Kubernetes scheduler, including dataset preparation, model deployment, MLflow integration, and post-training steps.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • zhaomingkun1030
  • typhoonzero

Poem

🐰 Hopping through docs with careful care,
Old guides removed from here and there,
One notebook path now shines so bright,
VolcanoJobs and MLflow in sight!
Training tales, more lean and light!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: removing fine-tuning navigation/documentation pages from the repository.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/remove-ui-fine-tuning

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@docs/en/workbench/how_to/fine_tunning_using_notebooks.mdx`:
- Line 1: Rename the file from fine_tunning_using_notebooks.mdx to
fine_tuning_using_notebooks.mdx and update all in-text occurrences of the typo
(e.g., the title and the phrase "fine tunning" around Line 92) to "fine tuning";
also update any imports, links, or references elsewhere in the repo that point
to fine_tunning_using_notebooks.mdx so they reference
fine_tuning_using_notebooks.mdx instead.
- Line 409: The git lfs track invocation uses an unquoted glob (*.safetensors)
which lets the shell expand matches prematurely; update the command shown for
"git lfs track *.safetensors" to wrap the glob pattern in quotes (e.g., use
single or double quotes around *.safetensors) so the pattern is passed to
git-lfs verbatim and not expanded by the shell.
- Around line 601-611: The YAML in the "Adapt Non-Nvidia GPUs" section still
uses nvidia.com/gpualloc, nvidia.com/gpucores and nvidia.com/gpumem; replace
those vendor-specific keys with a generic vendor placeholder (e.g.
<vendor>.gpualloc, <vendor>.gpucores, <vendor>.gpumem) or use the pattern from
the huawei.com/Ascend910 example so the template is vendor-agnostic and clearly
shows users to substitute their GPU vendor namespace.
- Around line 260-267: The dataset removal line `rm -rf ${DATASET_NAME}` makes
the subsequent existence check `if [ -d ${DATASET_NAME} ]` always false, so
either remove the unconditional `rm -rf ${DATASET_NAME}` to allow caching like
the model block (preferred), or alternatively remove the `if [ -d
${DATASET_NAME} ]`/else branch and always run the `git ... clone` command;
update the section around the DATASET_NAME and git clone lines accordingly so
the logic is consistent.
🧹 Nitpick comments (4)
docs/en/workbench/how_to/fine_tunning_using_notebooks.mdx (4)

314-316: Redundant export RANK=$RANK on Line 316.

RANK is already set on Line 314 (export RANK=$VC_TASK_INDEX). Line 316 re-exports it to itself, which is a no-op.

Suggested fix
                      export RANK=$VC_TASK_INDEX
                      export MASTER_HOST=$(echo "${VC_WORKER_HOSTS}" |awk -F',' '{print $1}')
-                      export RANK=$RANK
                      export WORLD_SIZE=$N_RANKS

428-445: Flat numbered list loses hierarchical structure.

Items 2–5 appear to be sub-items of item 1 (task image/locations), and items 7–9 are sub-items of item 6 (workspace PVC). The current flat 1–15 numbering makes it hard to follow. Consider using nested lists to convey the intended hierarchy.


476-480: Inconsistent hyphenation: "pre-training" vs. "pretrained".

Line 480 uses "pre-training" while Line 437 uses "pretrained" (without hyphen). Pick one form and use it consistently throughout the document.


134-155: Dockerfile pins specific mirror URLs that may not be accessible outside China.

The apt sources are rewritten to mirrors.ustc.edu.cn and pip uses pypi.tuna.tsinghua.edu.cn. A note for international users to revert these to default mirrors would improve usability, or consider parameterizing the mirror via build args.

@@ -0,0 +1,629 @@
# Notebook Model Fine-tuning and Training General Solution
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in filename: "fine_tunning" → "fine_tuning"

The filename fine_tunning_using_notebooks.mdx has a double "n" typo. The same typo appears on Line 92 ("fine tunning"). Please rename the file and fix the in-text occurrence.

🤖 Prompt for AI Agents
In `@docs/en/workbench/how_to/fine_tunning_using_notebooks.mdx` at line 1, Rename
the file from fine_tunning_using_notebooks.mdx to
fine_tuning_using_notebooks.mdx and update all in-text occurrences of the typo
(e.g., the title and the phrase "fine tunning" around Line 92) to "fine tuning";
also update any imports, links, or references elsewhere in the repo that point
to fine_tunning_using_notebooks.mdx so they reference
fine_tuning_using_notebooks.mdx instead.

Comment on lines +260 to +267
rm -rf ${DATASET_NAME}
rm -rf data

if [ -d ${DATASET_NAME} ]; then
echo "dataset ${DATASET_NAME} already exists skipping download"
else
git -c http.sslVerify=false -c lfs.activitytimeout=36000 clone "https://${gitauth}@${DATASET_URL_NO_HTTPS}"
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Dead-code branch: dataset directory is always removed before the existence check.

rm -rf ${DATASET_NAME} on Line 260 ensures the if [ -d ${DATASET_NAME} ] on Line 263 is always false, making the "skip download" branch unreachable. This appears unintentional — the model-download block above (Lines 249-255) correctly checks before removing. Either drop the rm -rf (to allow caching like the model block) or drop the if/else and always clone.

Suggested fix (allow dataset caching like the model block)
-                  rm -rf ${DATASET_NAME}
-                  rm -rf data
-
                   if [ -d ${DATASET_NAME} ]; then
-                      echo "dataset ${DATASET_NAME} already exists skipping download"
+                      echo "dataset ${DATASET_NAME} already exists, skip downloading"
+                      (cd ${DATASET_NAME} && git -c http.sslVerify=false -c lfs.activitytimeout=36000 lfs pull)
                   else
                       git -c http.sslVerify=false -c lfs.activitytimeout=36000 clone "https://${gitauth}@${DATASET_URL_NO_HTTPS}"
                   fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rm -rf ${DATASET_NAME}
rm -rf data
if [ -d ${DATASET_NAME} ]; then
echo "dataset ${DATASET_NAME} already exists skipping download"
else
git -c http.sslVerify=false -c lfs.activitytimeout=36000 clone "https://${gitauth}@${DATASET_URL_NO_HTTPS}"
fi
if [ -d ${DATASET_NAME} ]; then
echo "dataset ${DATASET_NAME} already exists, skip downloading"
(cd ${DATASET_NAME} && git -c http.sslVerify=false -c lfs.activitytimeout=36000 lfs pull)
else
git -c http.sslVerify=false -c lfs.activitytimeout=36000 clone "https://${gitauth}@${DATASET_URL_NO_HTTPS}"
fi
🤖 Prompt for AI Agents
In `@docs/en/workbench/how_to/fine_tunning_using_notebooks.mdx` around lines 260 -
267, The dataset removal line `rm -rf ${DATASET_NAME}` makes the subsequent
existence check `if [ -d ${DATASET_NAME} ]` always false, so either remove the
unconditional `rm -rf ${DATASET_NAME}` to allow caching like the model block
(preferred), or alternatively remove the `if [ -d ${DATASET_NAME} ]`/else branch
and always run the `git ... clone` command; update the section around the
DATASET_NAME and git clone lines accordingly so the logic is consistent.


git init
git checkout -b sft-${push_branch}
git lfs track *.safetensors
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing quotes around glob pattern in git lfs track.

Without quotes, the shell will expand *.safetensors against the current directory before passing it to git lfs track, which can cause unexpected behavior (tracking only the files that currently exist, or an error if none exist yet).

Suggested fix
-                  git lfs track *.safetensors
+                  git lfs track "*.safetensors"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
git lfs track *.safetensors
git lfs track "*.safetensors"
🤖 Prompt for AI Agents
In `@docs/en/workbench/how_to/fine_tunning_using_notebooks.mdx` at line 409, The
git lfs track invocation uses an unquoted glob (*.safetensors) which lets the
shell expand matches prematurely; update the command shown for "git lfs track
*.safetensors" to wrap the glob pattern in quotes (e.g., use single or double
quotes around *.safetensors) so the pattern is passed to git-lfs verbatim and
not expanded by the shell.

Comment on lines +601 to +611
resources:
# Ensure that there are sufficient resources to run fine tuning. If GPU is required, apply for the corresponding GPU/vGPU resources.
requests:
cpu: '1'
memory: '8Gi'
limits:
cpu: '8'
memory: '16Gi'
nvidia.com/gpualloc: '1'
nvidia.com/gpucores: '50'
nvidia.com/gpumem: '8192'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Non-Nvidia GPU template still specifies nvidia.com/* resource limits.

This YAML is in the "Adapt Non-Nvidia GPUs" section, yet the resource limits reference nvidia.com/gpualloc, nvidia.com/gpucores, and nvidia.com/gpumem. These should be replaced with a placeholder for the vendor-specific resource name (consistent with the huawei.com/Ascend910 example mentioned on Line 486).

Suggested fix
               resources:
-                # Ensure that there are sufficient resources to run fine tuning. If GPU is required, apply for the corresponding GPU/vGPU resources.
+                # Ensure that there are sufficient resources to run fine tuning.
+                # Replace the GPU resource key below with the vendor-specific resource name,
+                # e.g., huawei.com/Ascend910: 1
                 requests:
                   cpu: '1'
                   memory: '8Gi'
                 limits:
                   cpu: '8'
                   memory: '16Gi'
-                  nvidia.com/gpualloc: '1'
-                  nvidia.com/gpucores: '50'
-                  nvidia.com/gpumem: '8192'
+                  <vendor-gpu-resource-name>: '<count>'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
resources:
# Ensure that there are sufficient resources to run fine tuning. If GPU is required, apply for the corresponding GPU/vGPU resources.
requests:
cpu: '1'
memory: '8Gi'
limits:
cpu: '8'
memory: '16Gi'
nvidia.com/gpualloc: '1'
nvidia.com/gpucores: '50'
nvidia.com/gpumem: '8192'
resources:
# Ensure that there are sufficient resources to run fine tuning.
# Replace the GPU resource key below with the vendor-specific resource name,
# e.g., huawei.com/Ascend910: 1
requests:
cpu: '1'
memory: '8Gi'
limits:
cpu: '8'
memory: '16Gi'
<vendor-gpu-resource-name>: '<count>'
🤖 Prompt for AI Agents
In `@docs/en/workbench/how_to/fine_tunning_using_notebooks.mdx` around lines 601 -
611, The YAML in the "Adapt Non-Nvidia GPUs" section still uses
nvidia.com/gpualloc, nvidia.com/gpucores and nvidia.com/gpumem; replace those
vendor-specific keys with a generic vendor placeholder (e.g. <vendor>.gpualloc,
<vendor>.gpucores, <vendor>.gpumem) or use the pattern from the
huawei.com/Ascend910 example so the template is vendor-agnostic and clearly
shows users to substitute their GPU vendor namespace.

@cloudflare-workers-and-pages
Copy link

Deploying alauda-ai with  Cloudflare Pages  Cloudflare Pages

Latest commit: 46b9cfb
Status: ✅  Deploy successful!
Preview URL: https://d676296e.alauda-ai.pages.dev
Branch Preview URL: https://chore-remove-ui-fine-tuning.alauda-ai.pages.dev

View logs

@YuNianan
Copy link

/test-pass

@EdisonSu768 EdisonSu768 merged commit 8821594 into master Feb 10, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants