Skip to content

Refactor for multi strategy  #110

@gkorland

Description

@gkorland

To refactor the GraphRAG-SDK project so that you can flexibly provide different strategies for knowledge construction and querying for LLMs, you should decouple the logic for knowledge base construction and query execution from the core workflow. This can be achieved by introducing Strategy and Factory design patterns, clear interfaces, and configuration-driven selection. Here’s a step-by-step refactor plan:


1. Identify Key Responsibilities

  • Knowledge Construction: How the knowledge base is built from source data (e.g., chunking, embedding, indexing).

  • Query Execution: How user queries are processed and how results are retrieved from the knowledge base for the LLM.


2. Define Abstract Interfaces

Define Python abstract base classes (ABCs) or interfaces for each responsibility.

knowledge_strategy.py

from abc import ABC, abstractmethod


class KnowledgeConstructionStrategy(ABC):

    @abstractmethod

    def construct(self, source_data):

        pass

query_strategy.py

from abc import ABC, abstractmethod


class QueryStrategy(ABC):

    @abstractmethod

    def query(self, knowledge_base, user_query):

        pass

3. Implement Concrete Strategies

Create specific implementations for different approaches.

e.g., ChunkingKnowledgeStrategy, EmbeddingKnowledgeStrategy

class ChunkingKnowledgeStrategy(KnowledgeConstructionStrategy):

    def construct(self, source_data):

        # implementation for chunking


class EmbeddingKnowledgeStrategy(KnowledgeConstructionStrategy):

    def construct(self, source_data):

        # implementation for embeddings

e.g., SimpleQueryStrategy, SemanticQueryStrategy

class SimpleQueryStrategy(QueryStrategy):

    def query(self, knowledge_base, user_query):

        # implementation for simple keyword matching


class SemanticQueryStrategy(QueryStrategy):

    def query(self, knowledge_base, user_query):

        # implementation for semantic search

4. Refactor the Core Workflow

Modify the main orchestration code to use these abstractions.

class GraphRAGPipeline:

    def __init__(self, knowledge_strategy, query_strategy):

        self.knowledge_strategy = knowledge_strategy

        self.query_strategy = query_strategy


    def build_knowledge(self, data):

        self.knowledge_base = self.knowledge_strategy.construct(data)


    def answer_query(self, user_query):

        return self.query_strategy.query(self.knowledge_base, user_query)

5. Provide a Strategy Factory or Configuration

Allow end-users to select strategies via configuration or at runtime.

def get_knowledge_strategy(strategy_name):

    if strategy_name == "chunking":

        return ChunkingKnowledgeStrategy()

    elif strategy_name == "embedding":

        return EmbeddingKnowledgeStrategy()

    # Add more as needed


def get_query_strategy(strategy_name):

    if strategy_name == "simple":

        return SimpleQueryStrategy()

    elif strategy_name == "semantic":

        return SemanticQueryStrategy()

    # Add more as needed

6. Update Documentation & Tests

  • Document how to add new strategies.

  • Write tests for each strategy and for the pipeline using different combinations.


7. (Optional) Plug-in System

For extensibility, consider a plug-in system to register custom strategies outside the core codebase.


Summary Table

| Component           | Interface/Class                        | Example Implementation      |

|---------------------|----------------------------------------|----------------------------|

| Knowledge Building  | KnowledgeConstructionStrategy (ABC)    | ChunkingKnowledgeStrategy  |

| Query Execution     | QueryStrategy (ABC)                    | SemanticQueryStrategy      |

| Pipeline            | GraphRAGPipeline                       | Uses above via DI          |

| Factory/Config      | get_knowledge_strategy, get_query_strategy | YAML/JSON/CLI selection |


Benefits

  • You can easily add new strategies for knowledge construction or querying without touching the pipeline.

  • Clean separation of concerns.

  • Users can mix-and-match strategies to suit their use case.

Great! To tailor the refactor process for the GraphRAG-SDK project so you can flexibly provide different strategies for knowledge construction and querying, here’s a concrete, actionable plan you can follow:


1. Define Strategy Interfaces

Create abstract base classes or interfaces for both knowledge construction and query strategies. This makes it easy to plug in new approaches without changing your pipeline code.

Example:

# knowledge_strategies/base.py

from abc import ABC, abstractmethod


class KnowledgeStrategy(ABC):

    @abstractmethod

    def construct(self, data):

        pass


# query_strategies/base.py

from abc import ABC, abstractmethod


class QueryStrategy(ABC):

    @abstractmethod

    def query(self, kb, user_query):

        pass

2. Implement Concrete Strategies

Implement various strategies for both construction and querying.  

For instance:

# knowledge_strategies/chunking.py

from .base import KnowledgeStrategy


class ChunkingStrategy(KnowledgeStrategy):

    def construct(self, data):

        # Chunking logic here

        pass


# query_strategies/semantic.py

from .base import QueryStrategy


class SemanticQueryStrategy(QueryStrategy):

    def query(self, kb, user_query):

        # Semantic search logic here

        pass

3. Refactor the Main Pipeline

Modify your pipeline to accept strategy instances:

class GraphRAGPipeline:

    def __init__(self, knowledge_strategy, query_strategy):

        self.knowledge_strategy = knowledge_strategy

        self.query_strategy = query_strategy


    def build_knowledge(self, source_data):

        self.knowledge_base = self.knowledge_strategy.construct(source_data)


    def answer(self, query):

        return self.query_strategy.query(self.knowledge_base, query)

4. Add a Strategy Factory or Config Loader

Allow dynamic selection of strategies (from config, CLI, or elsewhere):

def get_knowledge_strategy(name):

    if name == "chunking":

        return ChunkingStrategy()

    # Add more as needed


def get_query_strategy(name):

    if name == "semantic":

        return SemanticQueryStrategy()

    # Add more as needed

Allow users to configure which strategy to use via config files or command-line arguments.


5. Document and Test

  • Add clear documentation on how to add new strategies.

  • Write unit tests for each strategy and for the pipeline with different combinations.


6. (Optional) Plug-in System

For maximum flexibility, consider using entry points or a plug-in system so users can add strategies without modifying core code.


Summary Table

| Component           | Interface                   | Example Implementation   |

|---------------------|----------------------------|-------------------------|

| Knowledge Building  | KnowledgeStrategy          | ChunkingStrategy        |

| Query Execution     | QueryStrategy              | SemanticQueryStrategy   |

| Pipeline            | GraphRAGPipeline           | Uses above via DI       |

| Factory/Config      | get_knowledge_strategy,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions