-
Notifications
You must be signed in to change notification settings - Fork 0
chore: remove finetune nav AI-23833 #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughMultiple fine-tuning and pre-training documentation pages were deleted from their distributed locations. A new comprehensive notebook-based fine-tuning guide was added to the workbench section, consolidating training workflows into a single resource with VolcanoJob integration and MLflow tracking. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@docs/en/workbench/how_to/fine_tunning_using_notebooks.mdx`:
- Line 1: Rename the file from fine_tunning_using_notebooks.mdx to
fine_tuning_using_notebooks.mdx and update all in-text occurrences of the typo
(e.g., the title and the phrase "fine tunning" around Line 92) to "fine tuning";
also update any imports, links, or references elsewhere in the repo that point
to fine_tunning_using_notebooks.mdx so they reference
fine_tuning_using_notebooks.mdx instead.
- Line 409: The git lfs track invocation uses an unquoted glob (*.safetensors)
which lets the shell expand matches prematurely; update the command shown for
"git lfs track *.safetensors" to wrap the glob pattern in quotes (e.g., use
single or double quotes around *.safetensors) so the pattern is passed to
git-lfs verbatim and not expanded by the shell.
- Around line 601-611: The YAML in the "Adapt Non-Nvidia GPUs" section still
uses nvidia.com/gpualloc, nvidia.com/gpucores and nvidia.com/gpumem; replace
those vendor-specific keys with a generic vendor placeholder (e.g.
<vendor>.gpualloc, <vendor>.gpucores, <vendor>.gpumem) or use the pattern from
the huawei.com/Ascend910 example so the template is vendor-agnostic and clearly
shows users to substitute their GPU vendor namespace.
- Around line 260-267: The dataset removal line `rm -rf ${DATASET_NAME}` makes
the subsequent existence check `if [ -d ${DATASET_NAME} ]` always false, so
either remove the unconditional `rm -rf ${DATASET_NAME}` to allow caching like
the model block (preferred), or alternatively remove the `if [ -d
${DATASET_NAME} ]`/else branch and always run the `git ... clone` command;
update the section around the DATASET_NAME and git clone lines accordingly so
the logic is consistent.
🧹 Nitpick comments (4)
docs/en/workbench/how_to/fine_tunning_using_notebooks.mdx (4)
314-316: Redundantexport RANK=$RANKon Line 316.
RANKis already set on Line 314 (export RANK=$VC_TASK_INDEX). Line 316 re-exports it to itself, which is a no-op.Suggested fix
export RANK=$VC_TASK_INDEX export MASTER_HOST=$(echo "${VC_WORKER_HOSTS}" |awk -F',' '{print $1}') - export RANK=$RANK export WORLD_SIZE=$N_RANKS
428-445: Flat numbered list loses hierarchical structure.Items 2–5 appear to be sub-items of item 1 (task image/locations), and items 7–9 are sub-items of item 6 (workspace PVC). The current flat 1–15 numbering makes it hard to follow. Consider using nested lists to convey the intended hierarchy.
476-480: Inconsistent hyphenation: "pre-training" vs. "pretrained".Line 480 uses "pre-training" while Line 437 uses "pretrained" (without hyphen). Pick one form and use it consistently throughout the document.
134-155: Dockerfile pins specific mirror URLs that may not be accessible outside China.The
aptsources are rewritten tomirrors.ustc.edu.cnand pip usespypi.tuna.tsinghua.edu.cn. A note for international users to revert these to default mirrors would improve usability, or consider parameterizing the mirror via build args.
| @@ -0,0 +1,629 @@ | |||
| # Notebook Model Fine-tuning and Training General Solution | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in filename: "fine_tunning" → "fine_tuning"
The filename fine_tunning_using_notebooks.mdx has a double "n" typo. The same typo appears on Line 92 ("fine tunning"). Please rename the file and fix the in-text occurrence.
🤖 Prompt for AI Agents
In `@docs/en/workbench/how_to/fine_tunning_using_notebooks.mdx` at line 1, Rename
the file from fine_tunning_using_notebooks.mdx to
fine_tuning_using_notebooks.mdx and update all in-text occurrences of the typo
(e.g., the title and the phrase "fine tunning" around Line 92) to "fine tuning";
also update any imports, links, or references elsewhere in the repo that point
to fine_tunning_using_notebooks.mdx so they reference
fine_tuning_using_notebooks.mdx instead.
| rm -rf ${DATASET_NAME} | ||
| rm -rf data | ||
|
|
||
| if [ -d ${DATASET_NAME} ]; then | ||
| echo "dataset ${DATASET_NAME} already exists skipping download" | ||
| else | ||
| git -c http.sslVerify=false -c lfs.activitytimeout=36000 clone "https://${gitauth}@${DATASET_URL_NO_HTTPS}" | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead-code branch: dataset directory is always removed before the existence check.
rm -rf ${DATASET_NAME} on Line 260 ensures the if [ -d ${DATASET_NAME} ] on Line 263 is always false, making the "skip download" branch unreachable. This appears unintentional — the model-download block above (Lines 249-255) correctly checks before removing. Either drop the rm -rf (to allow caching like the model block) or drop the if/else and always clone.
Suggested fix (allow dataset caching like the model block)
- rm -rf ${DATASET_NAME}
- rm -rf data
-
if [ -d ${DATASET_NAME} ]; then
- echo "dataset ${DATASET_NAME} already exists skipping download"
+ echo "dataset ${DATASET_NAME} already exists, skip downloading"
+ (cd ${DATASET_NAME} && git -c http.sslVerify=false -c lfs.activitytimeout=36000 lfs pull)
else
git -c http.sslVerify=false -c lfs.activitytimeout=36000 clone "https://${gitauth}@${DATASET_URL_NO_HTTPS}"
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| rm -rf ${DATASET_NAME} | |
| rm -rf data | |
| if [ -d ${DATASET_NAME} ]; then | |
| echo "dataset ${DATASET_NAME} already exists skipping download" | |
| else | |
| git -c http.sslVerify=false -c lfs.activitytimeout=36000 clone "https://${gitauth}@${DATASET_URL_NO_HTTPS}" | |
| fi | |
| if [ -d ${DATASET_NAME} ]; then | |
| echo "dataset ${DATASET_NAME} already exists, skip downloading" | |
| (cd ${DATASET_NAME} && git -c http.sslVerify=false -c lfs.activitytimeout=36000 lfs pull) | |
| else | |
| git -c http.sslVerify=false -c lfs.activitytimeout=36000 clone "https://${gitauth}@${DATASET_URL_NO_HTTPS}" | |
| fi |
🤖 Prompt for AI Agents
In `@docs/en/workbench/how_to/fine_tunning_using_notebooks.mdx` around lines 260 -
267, The dataset removal line `rm -rf ${DATASET_NAME}` makes the subsequent
existence check `if [ -d ${DATASET_NAME} ]` always false, so either remove the
unconditional `rm -rf ${DATASET_NAME}` to allow caching like the model block
(preferred), or alternatively remove the `if [ -d ${DATASET_NAME} ]`/else branch
and always run the `git ... clone` command; update the section around the
DATASET_NAME and git clone lines accordingly so the logic is consistent.
|
|
||
| git init | ||
| git checkout -b sft-${push_branch} | ||
| git lfs track *.safetensors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing quotes around glob pattern in git lfs track.
Without quotes, the shell will expand *.safetensors against the current directory before passing it to git lfs track, which can cause unexpected behavior (tracking only the files that currently exist, or an error if none exist yet).
Suggested fix
- git lfs track *.safetensors
+ git lfs track "*.safetensors"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| git lfs track *.safetensors | |
| git lfs track "*.safetensors" |
🤖 Prompt for AI Agents
In `@docs/en/workbench/how_to/fine_tunning_using_notebooks.mdx` at line 409, The
git lfs track invocation uses an unquoted glob (*.safetensors) which lets the
shell expand matches prematurely; update the command shown for "git lfs track
*.safetensors" to wrap the glob pattern in quotes (e.g., use single or double
quotes around *.safetensors) so the pattern is passed to git-lfs verbatim and
not expanded by the shell.
| resources: | ||
| # Ensure that there are sufficient resources to run fine tuning. If GPU is required, apply for the corresponding GPU/vGPU resources. | ||
| requests: | ||
| cpu: '1' | ||
| memory: '8Gi' | ||
| limits: | ||
| cpu: '8' | ||
| memory: '16Gi' | ||
| nvidia.com/gpualloc: '1' | ||
| nvidia.com/gpucores: '50' | ||
| nvidia.com/gpumem: '8192' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-Nvidia GPU template still specifies nvidia.com/* resource limits.
This YAML is in the "Adapt Non-Nvidia GPUs" section, yet the resource limits reference nvidia.com/gpualloc, nvidia.com/gpucores, and nvidia.com/gpumem. These should be replaced with a placeholder for the vendor-specific resource name (consistent with the huawei.com/Ascend910 example mentioned on Line 486).
Suggested fix
resources:
- # Ensure that there are sufficient resources to run fine tuning. If GPU is required, apply for the corresponding GPU/vGPU resources.
+ # Ensure that there are sufficient resources to run fine tuning.
+ # Replace the GPU resource key below with the vendor-specific resource name,
+ # e.g., huawei.com/Ascend910: 1
requests:
cpu: '1'
memory: '8Gi'
limits:
cpu: '8'
memory: '16Gi'
- nvidia.com/gpualloc: '1'
- nvidia.com/gpucores: '50'
- nvidia.com/gpumem: '8192'
+ <vendor-gpu-resource-name>: '<count>'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resources: | |
| # Ensure that there are sufficient resources to run fine tuning. If GPU is required, apply for the corresponding GPU/vGPU resources. | |
| requests: | |
| cpu: '1' | |
| memory: '8Gi' | |
| limits: | |
| cpu: '8' | |
| memory: '16Gi' | |
| nvidia.com/gpualloc: '1' | |
| nvidia.com/gpucores: '50' | |
| nvidia.com/gpumem: '8192' | |
| resources: | |
| # Ensure that there are sufficient resources to run fine tuning. | |
| # Replace the GPU resource key below with the vendor-specific resource name, | |
| # e.g., huawei.com/Ascend910: 1 | |
| requests: | |
| cpu: '1' | |
| memory: '8Gi' | |
| limits: | |
| cpu: '8' | |
| memory: '16Gi' | |
| <vendor-gpu-resource-name>: '<count>' |
🤖 Prompt for AI Agents
In `@docs/en/workbench/how_to/fine_tunning_using_notebooks.mdx` around lines 601 -
611, The YAML in the "Adapt Non-Nvidia GPUs" section still uses
nvidia.com/gpualloc, nvidia.com/gpucores and nvidia.com/gpumem; replace those
vendor-specific keys with a generic vendor placeholder (e.g. <vendor>.gpualloc,
<vendor>.gpucores, <vendor>.gpumem) or use the pattern from the
huawei.com/Ascend910 example so the template is vendor-agnostic and clearly
shows users to substitute their GPU vendor namespace.
Deploying alauda-ai with
|
| Latest commit: |
46b9cfb
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://d676296e.alauda-ai.pages.dev |
| Branch Preview URL: | https://chore-remove-ui-fine-tuning.alauda-ai.pages.dev |
|
/test-pass |
Summary by CodeRabbit