Skip to content

Use lightspeed_rag_content package in OpenShift docs processing#40

Merged
jpodivin merged 5 commits intoroad-core:mainfrom
lpiwowar:lpiwowar/refactor-openshift-asciidoc
Mar 27, 2025
Merged

Use lightspeed_rag_content package in OpenShift docs processing#40
jpodivin merged 5 commits intoroad-core:mainfrom
lpiwowar:lpiwowar/refactor-openshift-asciidoc

Conversation

@lpiwowar
Copy link
Contributor

This commit refactors the code responsible for cloning and converting of AsciiDoc formatted OpenShift docs to text formatted files.

  1. It moves all the AsciiDoc code related to OpenShift to examples folder to ensure clear separation between OpenShift specific code and road-core/rag-content code.

  2. It makes sure the code specific to OpenShift uses ligthspeed_rag_content.asciidoc package.

  3. Removes all the ruby based asciidoctor extensions as they are now part of lightspeed_rag_content package.

Depends-On: #39

@lpiwowar lpiwowar force-pushed the lpiwowar/refactor-openshift-asciidoc branch from a9e2ff4 to 0645673 Compare March 21, 2025 09:34
This commit refactors the code responsible for cloning and converting of
AsciiDoc formatted OpenShift docs to text formatted files.

  1. It moves all the AsciiDoc code related to OpenShift to examples
     folder to ensure clear separation between OpenShift specific code
     and road-core/rag-content code.

  2. It makes sure the code specific to OpenShift uses
     ligthspeed_rag_content.asciidoc package.

  3. Removes all the ruby based asciidoctor extensions as they are
     now part of lightspeed_rag_content package.

Signed-off-by: Lukas Piwowarski <lpiwowar@redhat.com>
asciidoctor is now installed by default in the base image. There
is no need to install it again in the child image.
@lpiwowar lpiwowar force-pushed the lpiwowar/refactor-openshift-asciidoc branch from 0645673 to 92032ed Compare March 26, 2025 11:05
@lpiwowar
Copy link
Contributor Author

Proof of testing

I tested the change locally using the following command: make run-ocp-example-test. I ran it both on main and lpiwowar:lpiwowar/refactor-openshift-asciidoc

Main branch

...
STEP 7/8: COPY --from=ocp-rag-content /rag/embeddings_model /rag/embeddings_model
--> 2407f1a46448
STEP 8/8: CMD python query_rag.py --db-path /rag/vector_db/ocp_product_docs/$OCP_DOCS_VERSION/     -x ocp-product-docs-$(echo $OCP_DOCS_VERSION | sed 's/\./_/g')     --model-path /rag/embeddings_model     -t $SCORE_THRESHOLD     --query "$TEST_QUERY"
COMMIT test-rag-content
--> 16cad886ff1e
Successfully tagged localhost/test-rag-content:latest
16cad886ff1e307153a798418a7ac0ec938c41e7af63396f2d03f453d2cecc6f
podman run test-rag-content
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]
LLM is explicitly disabled. Using MockLLM.
/opt/app-root/lib64/python3.11/site-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Node ID: 8bb8e9d2-7af3-4c5b-bd70-f21213008764
Text: The CVO monitors the state of each applied resource and the
states reported by all cluster Operators. The CVO only proceeds with
the update when all manifests and cluster Operators in the active
Runlevel reach a stable condition. After the CVO updates the entire
control plane through this process, the Machine Config Operator (MCO)
updates the op...
Score:  0.631

lpiwowar:lpiwowar/refactor-openshift-asciidoc branch

STEP 4/8: ENV SCORE_THRESHOLD="0.6"
--> df92e24de055
STEP 5/8: COPY ./scripts/query_rag.py .
--> ed1cb5f26e6a
STEP 6/8: COPY --from=ocp-rag-content /rag/vector_db/ocp_product_docs /rag/vector_db/ocp_product_docs
--> 9fe38537e605
STEP 7/8: COPY --from=ocp-rag-content /rag/embeddings_model /rag/embeddings_model
--> 64e4fa113bb1
STEP 8/8: CMD python query_rag.py --db-path /rag/vector_db/ocp_product_docs/$OCP_DOCS_VERSION/     -x ocp-product-docs-$(echo $OCP_DOCS_VERSION | sed 's/\./_/g')     --model-path /rag/embeddings_model     -t $SCORE_THRESHOLD     --query "$TEST_QUERY"
COMMIT test-rag-content
--> 1ced714ede92
Successfully tagged localhost/test-rag-content:latest
1ced714ede926957c4843a0a7c8f3ed1cfcf4ed0d30d237314be3a563545680b
podman run test-rag-content
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]
LLM is explicitly disabled. Using MockLLM.
/opt/app-root/lib64/python3.11/site-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Node ID: 0156660f-0557-4b42-89db-7e4394d84cc7
Text: The CVO monitors the state of each applied resource and the
states reported by all cluster Operators. The CVO only proceeds with
the update when all manifests and cluster Operators in the active
Runlevel reach a stable condition. After the CVO updates the entire
control plane through this process, the Machine Config Operator (MCO)
updates the op...
Score:  0.631

@lpiwowar lpiwowar marked this pull request as ready for review March 26, 2025 11:15
@lpiwowar lpiwowar requested a review from umago March 26, 2025 11:17
@lpiwowar
Copy link
Contributor Author

@jpodivin and @syedriko. Please, if you can review when you have time:)

Copy link
Collaborator

@umago umago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jpodivin jpodivin merged commit 376e127 into road-core:main Mar 27, 2025
4 checks passed
@lpiwowar lpiwowar deleted the lpiwowar/refactor-openshift-asciidoc branch March 27, 2025 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants