Skip to content

Add lightspeed_rag_content.asciidoc subpackage#39

Merged
jpodivin merged 5 commits intoroad-core:mainfrom
lpiwowar:lpiwowar/asciidoc-package
Mar 25, 2025
Merged

Add lightspeed_rag_content.asciidoc subpackage#39
jpodivin merged 5 commits intoroad-core:mainfrom
lpiwowar:lpiwowar/asciidoc-package

Conversation

@lpiwowar
Copy link
Contributor

This commit adds the ligthspeed_rag_content.asciidoc package. The purpose of this package is to:

1. Provide an interface for easy conversion of AsciiDoc formatted files, mainly to text format.

The AsciidoctorConverter class can be used to convert AsciiDoc files. On the backend, the class uses asciidoctor tool [1]. This makes the package dependent on this tool and ruby. The main reason for picking this tool is that as of now there is no easy way to convert AsciiDoc formatted files to text format using pure Python and as we have already an extension written for asciidoctor, we can reuse it.

This commit does not rule out the possibility of introducing a new converter later with a more suitable backend based on pure Python.

One can convert the .adoc file either by using the AsciidoctorConverter class or by using the lightspeed_rag_content.asciidoc module as follows:

python -m lightspeed_rag_content.asciidoc convert -i input_file.adoc -o output_file.txt

2. Allow investigation of a structure of AsciiDoc formatted files.

The introduced package wraps an already existing ruby script that dumps a file structure of .adoc file. This comes handy when writing custom ruby extension for asciidoctor. The script can be used as follows:

python -m lightspeed_rag_content.asciidoc get_structure input.adoc

[1] https://asciidoctor.org/

This commit adds the ligthspeed_rag_content.asciidoc package. The
purpose of this package is to:

1. Provide an interface for easy conversion of AsciiDoc formatted
   files, mainly to text format.

   The AsciidoctorConverter class can be used to convert AsciiDoc
   files. On the backend, the class uses asciidoctor tool [1]. This
   makes the package dependent on this tool and ruby. The main reason
   for picking this tool is that as of now there is no easy way to
   convert AsciiDoc formatted files to text format using pure Python
   and as we have already an extension written for asciidoctor, we can
   reuse it.

   This commit does not rule out the possibility of introducing a new
   converter later with a more suitable backend based on pure Python.

   One can convert the .adoc file either by using
   the AsciidoctorConverter class or by using the
   lightspeed_rag_content.asciidoc module as follows:

     python -m lightspeed_rag_content.asciidoc convert \
       -i input_file.adoc -o output_file.txt

2. Allow investigation of a structure of AsciiDoc formatted files.

   The introduced package wraps an already existing ruby script that
   dumps a file structure of adoc file. This comes handy when writing
   custom ruby extension for asciidoctor. The script can be used
   as follows:

     python -m lightspeed_rag_content.asciidoc get_structure input.adoc

[1] https://asciidoctor.org/

Signed-off-by: Lukas Piwowarski <lpiwowar@redhat.com>
This commit adds unit tests for the ligthspeed_rag_content.asciidoc
package.

Signed-off-by: Lukas Piwowarski <lpiwowar@redhat.com>
@lpiwowar lpiwowar force-pushed the lpiwowar/asciidoc-package branch from 0aa7e94 to 54e6eb6 Compare March 20, 2025 17:23
@lpiwowar lpiwowar marked this pull request as ready for review March 20, 2025 17:33
@lpiwowar
Copy link
Contributor Author

Please, if you have time @jpodivin, @umago or @syedriko. I still can not add reviewers.

@lpiwowar lpiwowar force-pushed the lpiwowar/asciidoc-package branch from 7e93a31 to 66a4814 Compare March 21, 2025 09:11
This commit allows using the AsciidoctorConverter to convert AsciiDoc
files to target formats that are by default supported by asciidoctor:

- html5
- xhtml5
- manpage

Signed-off-by: Lukas Piwowarski <lpiwowar@redhat.com>
@lpiwowar lpiwowar force-pushed the lpiwowar/asciidoc-package branch from 66a4814 to 711c65c Compare March 21, 2025 09:32
@lpiwowar
Copy link
Contributor Author

I've created an issue here -> #41

It seems like sometimes the push-ghcr job fails when downloading the embedding model. Also, it would be nice if the org members could re-run the jobs. Right now, if I want to rerun the job, I have to force push.

@jpodivin jpodivin requested review from jpodivin, syedriko and umago March 21, 2025 13:23
Copy link
Collaborator

@umago umago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@umago umago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lukas! LGTM!

This commit adds asciidoctor binary into the base image. This allows
the consumers of that image to use the lightspeed_rag_content.asciidoc
sub-package, as it is heavily dependent on asciidoctor.
@jpodivin jpodivin merged commit 32bba09 into road-core:main Mar 25, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants