An enhanced Streamlit application that extracts graph data (entities and relationships) from text input using LangChain and OpenAI's GPT-4o model, and generates interactive graphs with intelligent chunking and real-time progress tracking.
Example graph generated from Wardley Mapping documentation showing entities and relationships with interactive visualisation.
- Smart Text Chunking: Intelligent semantic splitting using markdown headers with 100k token limit per chunk
- Real-time Progress Tracking: Visual progress bar with time estimates and chunk-by-chunk status updates
- Multiple File Format Support: Upload both
.txtand.mdfiles for optimal chunking results - Multi-format Graph Export: Save graphs as HTML, JSON, GraphML, and GML formats
- Enhanced Error Handling: Comprehensive error management with proper cleanup
- Improved Performance: Optimised processing for large documents (tested with 229-chunk files)
- File Upload: Support for both
.txtand.mdfiles with automatic format detection - Direct Text Input: Paste or type text directly into the interface
- Large Document Handling: Automatic chunking for documents exceeding token limits
- Intelligent Chunking: Semantic markdown header-based splitting (# ## ### ####) with character-based fallback
- Real-time Progress: Visual progress bar with time estimates and status messages
- Interactive Visualisation: Physics-based graph layout with dark theme and filter controls
- Entity Relationship Extraction: Powered by OpenAI's GPT-4o model via LangChain
- HTML: Interactive web visualisation with PyVis
- JSON: Standard format for web applications and analysis tools
- GraphML: XML-based format compatible with Gephi, yEd, Cytoscape
- GML: Graph Modeling Language for NetworkX, igraph, and analysis libraries
- Standalone Web App: Modern JavaScript application for interactive graph exploration
- File Loading: Support for JSON, GraphML, GML files with drag & drop interface
- Advanced Filtering: Multi-select filters for nodes and relationships with real-time search
- Export Capabilities: PNG, CSV, and all standard graph formats
- No Server Required: Completely client-side application that works offline
- Python 3.8 or higher
- OpenAI API key
- 4GB+ RAM recommended for large documents
pip install -r requirements.txtCore Dependencies:
langchain(>= 0.1.0): Core LLM frameworklangchain-experimental(>= 0.0.45): Graph transformer featureslangchain-openai(>= 0.1.0): OpenAI GPT-4o integrationpython-dotenv(>= 1.0.0): Environment variable supportpyvis(>= 0.3.2): Interactive graph visualisationstreamlit(>= 1.32.0): Web UI frameworknetworkx: Graph data structure and algorithmstiktoken: Token counting and text processing
-
Clone this repository:
git clone https://github.com/tractorjuice/knowledge-graph-llms.git cd knowledge-graph-llms -
Create environment file:
echo "OPENAI_API_KEY=your_openai_api_key_here" > .env
-
Install dependencies:
pip install -r requirements.txt
streamlit run streamlit_app.py# Option 1: Direct file access
open js-app/index.html
# Option 2: Local server (recommended)
cd js-app
python -m http.server 8080
# Then visit http://localhost:8080jupyter notebook knowledge_graph.ipynbThe Streamlit application will open at http://localhost:8501.
Knowledge graphs are powerful data structures that represent information as interconnected entities and relationships. To learn more about how knowledge graphs can be used in various applications and their benefits, see this comprehensive guide: Everything You Ever Wanted to Know About GraphRAG.
- Choose Input Method: Select "Upload file" or "Input text" from the sidebar
- Provide Content:
- Upload: Select a
.txtor.mdfile (markdown files get better semantic chunking) - Direct Input: Paste or type text directly
- Upload: Select a
- Generate Graph: Click "Generate Knowledge Graph"
- Monitor Progress: Watch real-time progress with time estimates
- Review Results: View the generated graph in the Streamlit interface
- Open Graph Explorer: Navigate to
js-app/index.htmlor start local server - Load Generated Data: Click "π Load Graph" β Use quick load buttons for generated files
- Advanced Analysis:
- Apply sophisticated filters for nodes and relationships
- Use real-time search with highlighting
- Explore connections with dynamic node highlighting
- Export Results: Generate PNG images, CSV data, or other formats as needed
- Streamlit: Optimised for text processing and graph generation
- JavaScript App: Optimised for interactive exploration and advanced analysis
- Seamless Integration: Files generated by Streamlit work directly in JavaScript app
- Visual Progress Bar: Shows completion percentage
- Status Messages: Real-time updates like "Processing chunk 3/15..."
- Time Estimates: Dynamic estimates improve as processing continues
- Chunk Details: Individual chunk completion times displayed
- Navigation: Drag nodes, zoom with mouse wheel
- Information: Hover over nodes and edges for details
- Filtering: Use built-in filter menu for nodes and relationships
- Physics: ForceAtlas2 layout with optimised spring constants
- Markdown (.md): Optimal for semantic chunking with headers
- Text (.txt): Supported with character-based fallback chunking
- Large Files: Automatically chunked (tested up to 175k tokens/229 chunks)
Input Text β Token Analysis β Semantic Chunking β Entity Extraction β Graph Generation β Multi-format Export
-
Chunking Strategy:
- Primary: Markdown header splitting (# ## ### ####)
- Fallback: Recursive character splitting with 200-char overlap
- Limit: 100k tokens per chunk (respects GPT-4 context window)
-
Entity Extraction:
- LangChain's
LLMGraphTransformerwith GPT-4o - Parallel processing of chunks with progress tracking
- Automatic relationship detection and validation
- LangChain's
-
Visualisation:
- PyVis network with ForceAtlas2 physics
- Dark theme with customisable spring constants
- Filter menu for interactive exploration
| Format | Use Case | Compatible Tools |
|---|---|---|
| HTML | Interactive web visualisation | Any web browser |
| JSON | Web apps, custom analysis | D3.js, Cytoscape.js, custom tools |
| GraphML | Advanced network analysis | Gephi, yEd, Cytoscape |
| GML | Graph libraries and research | NetworkX, igraph, R packages |
- Processing Speed: ~10-15 seconds per chunk (depends on content complexity)
- Memory Usage: Scales with document size (4GB RAM recommended for large files)
- Token Limits: Handles documents up to millions of tokens via intelligent chunking
- Tested Scale: Successfully processed 175k token documents (229 chunks)
This project is an enhanced version based on the original knowledge graph tutorial:
Original Source: Knowledge Graph Tutorial Enhanced by: Claude Code (Anthropic's AI Assistant) Enhancements Include:
- Intelligent text chunking with semantic splitting
- Real-time progress tracking with time estimates
- Multiple file format support (.txt, .md)
- Multi-format graph export (JSON, GraphML, GML)
- Improved error handling and user experience
- Performance optimisations for large documents
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain Team: For the experimental graph transformer capabilities
- OpenAI: For the GPT-4o model powering entity extraction
- PyVis/vis.js: For the interactive graph visualisation framework
- Original Tutorial: Foundation knowledge graph implementation
- Community: Feedback and suggestions for improvements
Version: 2.0 Last Updated: January 2025 Compatibility: Python 3.8+, OpenAI API v1+
