This repository contains the implementation of an AI-Enhanced Document Processing Pipeline using Argo Workflows on OpenShift Local (CRC), with Ollama for AI/LLM capabilities and MinIO for artifact storage.
- Overview
- Architecture
- Prerequisites
- Quick Start
- Scripts and Tools
- Services and Access
- Usage Guide
- Troubleshooting
- Next Steps
This project implements an enterprise-grade document processing pipeline that leverages:
- OpenShift Local (CRC) - Container platform for running the infrastructure
- Argo Workflows - Workflow orchestration for document processing pipelines
- Ollama - Local LLM runtime for AI-powered document analysis
- MinIO - S3-compatible object storage for artifacts and documents
- Automated setup and deployment scripts
- Idempotent installation process (safe to run multiple times)
- AI-powered document processing workflows
- NEW: Smart Inbox - Drop documents in one place, AI organizes them automatically
- Event-driven architecture with Argo Events
- Scalable architecture suitable for production deployment
- Complete local development environment
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OpenShift Local (CRC) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββ β
β β Argo β β Ollama β β MinIO β β
β β Workflows β β (LLMs) β β (Storage) β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββ β
β β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Document Processing Pipeline β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
-
System Requirements
- macOS, Linux, or Windows with WSL2
- Minimum 16GB RAM (32GB recommended)
- 60GB+ free disk space
- 6+ CPU cores
-
Required Accounts
- Red Hat Developer account (free): https://developers.redhat.com/register
- Pull secret from: https://console.redhat.com/openshift/create/local
-
Software
- OpenShift Local (CRC) - downloaded from Red Hat console
- Git
- curl
git clone <repository-url>
cd argoai# Make scripts executable
chmod +x *.sh
# Option 1: Run the new modular setup (recommended)
./setup.sh # Sets up CRC and installs all tools (Helm, Argo CLI, MinIO Client, jq)
./deploy-with-helm.sh # Deploys services using Helm (includes CRD installation)
# Option 2: Run the legacy all-in-one setup
./scripts/setup-crc-openshift.sh # Complete setup (deprecated, doesn't install CLI tools)The setup.sh script will:
- Guide you through CRC installation if needed
- Prompt for your Red Hat pull secret (hidden input)
- Set up OpenShift Local with proper resources
- Install required CLI tools:
- Helm - Package manager for Kubernetes
- Argo CLI - Native workflow management commands
- MinIO Client (mc) - Object storage management
- jq - JSON processing utility
- Configure the OpenShift environment
- Display access URLs and credentials
Then deploy-with-helm.sh will:
- Deploy all required services (Argo, Ollama, MinIO)
- Load AI models into Ollama
- Configure integrations and security
π For detailed Helm deployment options and configurations, see the Helm Chart README
# Source the environment setup (do this once per terminal session)
source ./setup-env.sh# Load the recommended model for document processing
./scripts/ollama-load-model.sh -m llama3.2:3b
# Or use the convenience function after sourcing setup-env.sh
load-model llama3.2:3b| Script | Purpose | Usage |
|---|---|---|
setup.sh |
NEW - Setup OpenShift Local (CRC) and install all tools (Helm, Argo CLI, MinIO Client, jq) | ./setup.sh |
deploy-with-helm.sh |
NEW - Deploy services using Helm charts | ./deploy-with-helm.sh |
setup-crc-openshift.sh |
./scripts/setup-crc-openshift.sh |
|
setup-env.sh |
Environment setup with aliases and functions | source ./setup-env.sh |
ollama-load-model.sh |
Load AI models into Ollama (wrapper) | ./scripts/ollama-load-model.sh -m MODEL |
load-ollama-models.sh |
Advanced model management with interactive menu | ./scripts/load-ollama-models.sh [options] |
ollama-api-helper.sh |
Interact with Ollama via API | ./scripts/ollama-api-helper.sh |
submit-test-workflow.sh |
Submit and manage test workflows | ./scripts/submit-test-workflow.sh [options] |
fix-common-issues.sh |
Fix common issues and check system health | ./scripts/fix-common-issues.sh |
deploy-full-workflow.sh |
Deploy the complete AI workflow template | ./scripts/deploy-full-workflow.sh |
scripts/setup-crc-openshift.sh (deprecated - use setup.sh instead)
- Fully idempotent (safe to run multiple times)
- Automatic password extraction from CRC
- Handles pull secret configuration
- Deploys all services with proper configurations
- Fixes common issues automatically
setup-env.sh
- Sets up kubectl/oc environment
- Provides convenient aliases
- Exports service URLs as environment variables
- Includes helper functions for common tasks
Model Loading Scripts
- Support for multiple models
- Progress tracking
- Model testing capabilities
- Both CLI and interactive modes
scripts/submit-test-workflow.sh
- Automated workflow submission with defaults
- Predefined sample documents for testing
- Model availability checking
- Workflow monitoring and management
- Integrated with CRC environment setup
scripts/fix-common-issues.sh
- Interactive troubleshooter for known issues
- Fixes Argo UI 502 errors automatically
- Repairs workflow controller configuration
- Checks system health and pod status
- One-click fix for all common problems
After setup, the following services are available:
- Argo Workflows UI:
https://argo-server-argo-ai-demo.apps-crc.testing - MinIO Console:
https://minio-console-argo-ai-demo.apps-crc.testing - Ollama API:
https://ollama-argo-ai-demo.apps-crc.testing - OpenShift Console:
https://console-openshift-console.apps-crc.testing
- OpenShift Admin: kubeadmin / (auto-extracted password)
- OpenShift Developer: developer / developer
- MinIO: Configured via environment variables (see .env.example)
argo-ui # Open Argo UI in browser
minio-ui # Open MinIO Console
openshift-ui # Open OpenShift Console# List available models
./scripts/load-ollama-models.sh -l
# Load a specific model
./scripts/ollama-load-model.sh -m mistral:7b
# Check loaded models
./scripts/load-ollama-models.sh -s
# Test a model
./scripts/ollama-api-helper.sh # Choose option 3# After sourcing setup-env.sh
ollama-logs # Follow Ollama logs
argo-logs # Follow Argo server logs
workflow-logs # Follow workflow controller logs
get-pods # Show all pods# Submit with defaults (Kubernetes README, llama3.2:3b model)
./scripts/submit-test-workflow.sh
# Use a predefined sample document
./scripts/submit-test-workflow.sh -s argo # Argo Workflows README
./scripts/submit-test-workflow.sh -s openshift # OpenShift README
# Submit with custom document and model
./scripts/submit-test-workflow.sh -d "https://example.com/document.txt" -m mistral:7b
# Submit and watch the workflow progress
./scripts/submit-test-workflow.sh -w
# List all workflows
./scripts/submit-test-workflow.sh -l
# Get details of a specific workflow
./scripts/submit-test-workflow.sh -g <workflow-name>submit-workflow "https://example.com/document.pdf" "llama3.2:3b"oc create -f - <<EOF
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: ai-doc-process-
namespace: argo-ai-demo
spec:
workflowTemplateRef:
name: ai-document-processor
arguments:
parameters:
- name: doc-url
value: "https://example.com/document.txt"
- name: model
value: "llama3.2:3b"
EOFQuick Fix: Run ./scripts/fix-common-issues.sh for an interactive troubleshooter that can fix most issues automatically.
-
"oc command not found"
eval $(crc oc-env) # Or use the wrapper scripts that handle this automatically
-
"Failed to get kubeadmin password"
- Ensure CRC is running:
crc status - Manually get password:
crc console --credentials
- Ensure CRC is running:
-
Workflow Controller CrashLoopBackOff
- Fixed automatically by the script
- Manual fix: Check the workflow-controller-configmap YAML formatting
-
Argo UI 502 Bad Gateway Error
- Cause: Readiness probe mismatch (HTTPS probe with HTTP server)
- Fixed automatically by the script
- Manual fix:
oc patch deployment argo-server -n argo-ai-demo --type='json' \ -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/readinessProbe/httpGet/scheme", "value": "HTTP"}]'
-
Model pulling takes forever
- Normal for first-time downloads
- Monitor progress:
oc logs deployment/ollama -f -n argo-ai-demo
-
Cannot access web UIs
- Check routes:
oc get routes -n argo-ai-demo - Ensure you're using HTTPS and accepting self-signed certificates
- Try port-forwarding as alternative:
oc port-forward svc/argo-server 2746:2746 -n argo-ai-demo
- Check routes:
# Check pod status
oc get pods -n argo-ai-demo
# View pod logs
oc logs <pod-name> -n argo-ai-demo
# Describe a pod for detailed info
oc describe pod <pod-name> -n argo-ai-demo
# Check events
oc get events -n argo-ai-demo --sort-by='.lastTimestamp'- OpenShift Local (CRC) setup and configuration
- Argo Workflows deployment and configuration
- Ollama AI service deployment
- MinIO object storage deployment
- Service integrations and RBAC
- Automated setup scripts with idempotency
- Model loading utilities
- Environment setup helpers
- Full AI workflow template with complete document processing
- Document classification and priority detection
- Type-specific information extraction
- Intelligent routing and report generation
- Monitoring and observability setup (Prometheus ServiceMonitor)
- Production deployment guide
- Performance tuning and optimization
- Batch processing capabilities
- API gateway for external integration
Problem: The Argo Server pod is not ready due to authentication issues.
Solution: The Helm chart has been updated to include --auth-mode=server. If you still face this issue:
# Check pod status
oc get pods -n argo-ai-demo | grep argo-server
# If pod is 0/1, upgrade the Helm release
helm upgrade ai-pipeline ./helm/ai-document-pipeline \
--namespace argo-ai-demo \
--values ./helm/ai-document-pipeline/values-dev.yaml \
--no-hooksProblem: Logs show warnings about workflowartifactgctasks and workflowtaskresults.
Solution: These are optional CRDs and the warnings don't affect basic functionality. They can be safely ignored for development use.
Problem: The load-models job shows CrashLoopBackOff due to missing jq command.
Solution: The models are actually loaded successfully. You can verify and clean up:
# Check loaded models
curl -s https://ollama-argo-ai-demo.apps-crc.testing/api/tags --insecure | jq -r '.models[].name'
# Delete the failed job
oc delete job -l app.kubernetes.io/component=ollama-models -n argo-ai-demoProblem: Script can't find Ollama pod due to label mismatch.
Solution: The script has been updated to use the correct label. Pull the latest changes or manually update:
# The correct label is:
app.kubernetes.io/component=ollamaProblem: helm install fails with timeout waiting for post-install hooks.
Solution: Use --no-hooks flag or increase timeout:
# Install without hooks
helm install ai-pipeline ./helm/ai-document-pipeline \
--create-namespace \
--namespace argo-ai-demo \
--values ./helm/ai-document-pipeline/values-dev.yaml \
--no-hooks# Check all pods status
oc get pods -n argo-ai-demo
# Check services and routes
oc get svc,route -n argo-ai-demo
# Test Ollama
curl -s https://ollama-argo-ai-demo.apps-crc.testing/api/tags --insecure
# Test Argo UI
curl -s -o /dev/null -w "%{http_code}" https://argo-server-argo-ai-demo.apps-crc.testing --insecure-
Load AI Models: If you haven't already, load at least one model:
./scripts/ollama-load-model.sh -m llama3.2:3b
-
Use the Smart Inbox (NEW!): Drop documents and let AI organize them:
# Configure MinIO client mc alias set myminio https://minio-argo-ai-demo.apps-crc.testing admin changeme # Drop any document into the inbox mc cp invoice.pdf myminio/documents/inbox/ # Document will be automatically: # - Classified (invoice/contract/technical/correspondence) # - Prioritized (normal/urgent) # - Moved to organized folder # - Processed with appropriate workflow
-
Test the Pipeline: Submit a test workflow to process a document manually
-
Customize Workflows: Modify the workflow template for your specific use cases
-
Add Monitoring: Implement the ServiceMonitor for Prometheus integration
-
Scale Up: Consider moving to a production OpenShift cluster for real workloads
All documentation has been organized in the docs/ directory:
- docs/STATUS.md - π Current deployment status, known issues, and test results
- docs/IMPLEMENTATION_PLAN.md - Detailed step-by-step implementation guide
- docs/PLAN1.md - Original project plan and architecture
- docs/CLAUDE.md - Additional implementation notes
- docs/DEMO.md - π Comprehensive demo walkthrough with tables, diagrams, and real examples
- docs/PRODUCTION-DEPLOYMENT-PLAN.md - Enterprise deployment strategies (Ansible, Python, Helm, GitOps, Terraform)
- docs/SECURITY-AUDIT.md - Security considerations and hardening guide
See docs/README.md for the complete documentation index.
- Review docs/SECURITY-AUDIT.md for a complete security checklist
- Run
./scripts/prepare-for-public.shto check for hardcoded credentials - Use environment variables instead of hardcoded passwords
- Never commit pull secrets, tokens, or actual passwords
- Update all credentials from their default values
See .env.example for configuration options.
Feel free to submit issues, fork the repository, and create pull requests for any improvements.
This project is licensed under the MIT License - see the LICENSE file for details.
Note: This is a development setup using OpenShift Local. For production deployments, use a full OpenShift cluster with appropriate resources and security configurations.