Skip to content

Conversation

@JSv4
Copy link
Owner

@JSv4 JSv4 commented Jan 24, 2026

Summary

This PR introduces a pluggable comparison engine architecture for the python-redlines library, enabling support for multiple document comparison backends. It adds the Docxodus engine (a modern .NET 8.0 fork of Open-Xml-PowerTools) alongside the existing Open-Xml-PowerTools engine, with improved build infrastructure and comprehensive documentation.

Key Changes

Architecture & Core

  • New base class system: Introduced ComparisonEngine abstract base class and ComparisonError exception for a pluggable architecture
  • Engine registry: Added EngineRegistry for dynamic engine discovery and selection via get_engine() function
  • Binary manager: Created BinaryManager utility class to handle platform-specific binary extraction and caching for both Windows, Linux, and macOS (x64 and ARM64)

Engine Implementations

  • XmlPowerToolsEngine: Refactored existing engine to inherit from ComparisonEngine with improved error handling and backward compatibility
  • DocxodusEngine: New engine implementation wrapping the Docxodus .NET 8.0 binary with better move detection and format change detection
  • Both engines support the same compare(author, original, modified) interface

Build System

  • Enhanced build_differ.py:
    • Refactored to support building multiple engines with configurable platforms
    • Added command-line arguments for selective engine/platform building (--engine, --platform)
    • Improved error handling and logging
    • Supports both tar.gz and zip archives based on platform
  • New Docxodus C# project: Added csproj-docxodus/ with Program.cs and redline.csproj for building the Docxodus binary
  • Hatch build hook: Updated to skip binary builds when SKIP_BINARY_BUILD environment variable is set (useful for development)

Project Metadata

  • Updated pyproject.toml with proper description, keywords, and classifiers
  • Added optional dev dependencies for testing and building
  • Updated project URLs to correct GitHub repository
  • Added pytest configuration with coverage support
  • New hatch scripts: build-engines, build-openxml, build-docxodus

Documentation

  • Added comprehensive module docstrings to __init__.py with quick start examples
  • Added detailed docstrings to all new classes and methods
  • Improved code comments throughout

Directory Structure

  • Created separate dist subdirectories for each engine: dist/openxml-powertools/ and dist/docxodus/
  • Each with appropriate .gitignore files for binary artifacts

Implementation Details

  • Lazy binary extraction: DocxodusEngine uses lazy extraction while XmlPowerToolsEngine maintains eager extraction for backward compatibility
  • Platform detection: Automatic detection of OS and architecture (x64/ARM64) with appropriate binary selection
  • Error handling: Comprehensive error messages with stdout/stderr capture for debugging
  • Backward compatibility: Maintained run_redline() method and extracted_binaries_path property on XmlPowerToolsEngine for existing code
  • Flexible build process: Build script can target specific engines and platforms, useful for CI/CD pipelines

Testing Considerations

The changes maintain backward compatibility while enabling new functionality. Existing code using XmlPowerToolsEngine directly will continue to work unchanged. New code can use the registry system for more flexible engine selection.

This commit introduces a pluggable architecture for document comparison engines,
allowing users to choose between Open-Xml-PowerTools and Docxodus backends.

Key changes:
- Add abstract ComparisonEngine base class defining the engine interface
- Refactor XmlPowerToolsEngine to inherit from ComparisonEngine
- Add DocxodusEngine for the modern Docxodus backend (.NET 8.0)
- Create EngineRegistry for dynamic engine discovery and selection
- Add get_engine(), list_engines(), list_available_engines() functions
- Create Docxodus C# CLI project using Docxodus NuGet package
- Update build system to support building both engines
- Add comprehensive test suite (79 tests) covering:
  - Base class interface contracts
  - Engine implementations with mocking
  - Registry functionality
  - Package imports and API
  - Backward compatibility with run_redline() method

Breaking changes: None - maintains full backward compatibility

Usage:
  from python_redlines import get_engine
  engine = get_engine()  # default: openxml-powertools
  engine = get_engine('docxodus')  # use Docxodus
  redline, _, _ = engine.compare(author, original, modified)
- Run tests on Ubuntu, Windows, and macOS
- Test against Python 3.9, 3.10, 3.11, 3.12
- Build .NET binaries before running tests
- Include coverage reporting via Codecov
- Add separate unit-test-only job for quick feedback
- Add linting check with ruff
- Add build verification job
- Add Quick Start section with installation and basic usage
- Document the pluggable engine architecture
- Add comparison table for available engines
- Explain Docxodus benefits and plan to make it default
- Add API reference section with ComparisonEngine interface
- Update project structure documentation
- Add supported platforms table
- Include development setup and build instructions
- Add roadmap section
- Add CI badge and other status badges
- Link to Docxodus, Open-Xml-PowerTools, and other resources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants