ProdHelp - Multi-Platform Observability MCP Server

A Model Context Protocol (MCP) server that provides unified access to multiple observability and infrastructure tools with natural language query generation.

Now with 55+ production templates for debugging, incident response, deployment analysis, capacity planning and business metrics.

Supported Platforms

Platform	Query Language	Use Cases
New Relic	NRQL via NerdGraph	APM metrics, error rates, throughput, infrastructure, deployment analysis
Splunk	SPL	Log search, event analysis, error investigation, root cause
Kubernetes	kubectl	Pod management, logs, cluster operations

Key Features

Auto-detect Platform - Automatically routes queries to the apropriate platform based on natural language

55+ Production Templates - Pre-built templates for common production scenarios:

14 Debug templates (current failures, errors, latency)
9 P1 Incident templates (critical metrics, spike analysis)
6 Deployment templates (version comparison, rollback validation)
8 Capacity templates (memory leaks, resource saturation)
7 Business templates (revenue impact, SLA compliance)
11+ Splunk log templates

Execute or Preview - Generate queries for review or execute directly against your systems

Natural Language - Just describe what you want: "deployment comparison for payment-api" or "memory leak detection"

Schema Reference - Built-in documentation for each query language and template

Installation

Quick Setup (Recommended)

Windows (PowerShell):

.\setup.ps1

Windows (CMD):

setup.bat

Linux/macOS:

chmod +x setup.sh
./setup.sh

Manual Setup

# Create virtual environment
python -m venv .venv

# Activate (Windows PowerShell)
.venv\Scripts\Activate.ps1

# Activate (Windows CMD)
.venv\Scripts\activate.bat

# Activate (Linux/macOS)
source .venv/bin/activate

# Install dependencies
pip install -e ".[dev]"

Configuration

Set environment variables for the platforms you want to use:

New Relic

NEW_RELIC_API_KEY=your-api-key
NEW_RELIC_ACCOUNT_ID=your-account-id

Splunk

SPLUNK_HOST=splunk.example.com
SPLUNK_TOKEN=your-token
# Or use username/password
SPLUNK_USERNAME=admin
SPLUNK_PASSWORD=password

Kubernetes

KUBECONFIG=/path/to/.kube/config
KUBE_CONTEXT=my-cluster
KUBE_NAMESPACE=default

Custom APIs

# Path to custom API definitions
CUSTOM_APIS_PATH=apis/custom_apis.json

# API-specific tokens (used by custom_apis.json)
GITHUB_TOKEN=ghp_xxxxxxxxxxxx
PAGERDUTY_API_KEY=u+xxxxxxxxxxxxxxxx
DATADOG_API_KEY=xxxxxxxxxxxxxxxxxxxxx
DATADOG_APP_KEY=xxxxxxxxxxxxxxxxxxxxx
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxx/xxx/xxx

You can also create a .env file in the project root. Copy .env.example to get started:

cp .env.example .env

Usage

Running the MCP Server

# Run directly
python -m src.server

# Or using the MCP CLI
mcp run src/server.py

The server will start and listen for MCP client connections. Use it with Claude Desktop, Cline, or any MCP-compatible client.

MCP Client Configuration

Add to your MCP client configuration (e.g., Claude Desktop config file):

{
  "mcpServers": {
    "prodhelp": {
      "command": "python",
      "args": ["-m", "src.server"],
      "cwd": "d:/apps/prodhelp",
      "env": {
        "NEW_RELIC_API_KEY": "your-key",
        "NEW_RELIC_ACCOUNT_ID": "your-account",
        "SPLUNK_HOST": "splunk.example.com",
        "SPLUNK_TOKEN": "your-token"
      }
    }
  }
}

Real-World Usage Examples

Scenario 1: Just deployed and want to validate

User: "deployment comparison for payment-api before 14:30 and after 14:30"

Server analyzes:
- Before: 0.5% error rate, 200ms latency
- After: 5.0% error rate, 450ms latency
- Conclusion: Deployment caused issues, consider rollback

Scenario 2: P1 alert fired for high error rate

User: "comprehensive debug for payment-api"
Then: "error reasons for payment-api"
Then: "failed endpoints for payment-api"

Result: Found that /api/v1/checkout endpoint has 95% errors
Root cause identified in 2 minutes

Scenario 3: Application getting slower over time

User: "memory leak detection"

Server shows:
- payment-api-01: Memory growing 0.8% per minute
- payment-api-02: Memory growing 0.9% per minute
- Memory leak detected, restart recomended

Scenario 4: Need to show business impact to executives

User: "revenue impact for payment-api with 150 avg transaction"

Server calculates:
- 500 failed transactions in last hour
- Estimated revenue loss: $75,000
- Critical: Escalate immediately

Common Workflows

Daily Health Check (2 minutes):

1. newrelic: comprehensive debug for <your-app>
2. newrelic: critical transactions for <your-app>
3. newrelic: resource saturation

After Every Deployment (30 seconds):

1. newrelic: deployment comparison for <app> before <time> and after <time>
2. newrelic: version errors for <app>

P1 Incident Response (5-10 minutes):

1. newrelic: comprehensive debug for <app>
2. newrelic: current failures for <app>
3. newrelic: error reasons for <app>
4. splunk: p1 root cause in production
5. newrelic: failed endpoints for <app>

Capacity Planning (5 minutes):

1. newrelic: capacity forecast for <app>
2. newrelic: resource saturation
3. newrelic: memory leak detection
4. newrelic: connection pool status for <app>

Tools

`query`

Universal query tool with automatic platform detection.

query(
    text="What's the error rate for my-service?",
    execute=False  # Set True to execute
)

`newrelic_query`

Generate or execute New Relic NRQL queries.

newrelic_query(
    intent="error rate",
    app_name="my-service",
    time_range="1 hour ago"
)

`splunk_query`

Generate or execute Splunk SPL queries.

splunk_query(
    intent="top errors",
    index="production",
    time_range="-24h"
)

`kubectl_command`

Generate or execute kubectl commands.

kubectl_command(
    intent="get logs",
    namespace="production",
    resource_name="my-pod"
)

`list_templates`

View available query templates.

list_templates(platform="newrelic")  # or "splunk", "kubectl", or None for all

`get_schema`

Get reference documentation for a query language.

get_schema(platform="splunk")

Custom APIs

Define any REST API in JSON format and it automatically becomes an MCP tool.

Adding a Custom API

Edit apis/custom_apis.json
Add your API definition with endpoints
Call reload_custom_apis() or restart the server

Custom API Tools

# List available custom APIs
list_custom_apis()

# Get endpoint details with sample request/response
get_api_endpoint_info(api_id="github_issues", endpoint_id="list_issues")

# Call an API
call_custom_api(
    api_id="github_issues",
    endpoint_id="list_issues",
    params={"owner": "microsoft", "repo": "vscode", "state": "open"}
)

# Reload after editing custom_apis.json
reload_custom_apis()

Custom API JSON Format

{
  "apis": [
    {
      "id": "my_api",
      "name": "My API",
      "description": "Description here",
      "enabled": true,
      "base_url": "https://api.example.com",
      "auth": {
        "type": "bearer",
        "token_env": "MY_API_TOKEN"
      },
      "endpoints": [
        {
          "id": "get_data",
          "name": "Get Data",
          "method": "GET",
          "path": "/data/{id}",
          "parameters": {
            "path": {
              "id": {"type": "string", "required": true}
            },
            "query": {
              "limit": {"type": "integer", "default": 10}
            }
          },
          "sample_request": {"id": "123", "limit": 5},
          "sample_response": {"status": 200, "body": {...}}
        }
      ]
    }
  ]
}

Supported Auth Types:

none - No authentication
bearer - Bearer token from env var
basic - Username/password from env vars
header - Custom header with template
headers - Multiple custom headers
api_key - API key as query parameter

Example Queries

The server now includes 55+ production-ready templates across debugging, incident response, deployment analysis, capacity planning, and business metrics.

Quick Start Examples

Just deployed? Check the impact:

newrelic: deployment comparison for payment-api before 14:30 and after 14:30
newrelic: version errors for payment-api
newrelic: rollback validation for payment-api

Got a P1 incident? Start here:

newrelic: comprehensive debug for payment-api
newrelic: current failures for payment-api
newrelic: error reasons for payment-api
splunk: p1 root cause in production

Memory or capacity issues:

newrelic: memory leak detection
newrelic: connection pool status for payment-api
newrelic: capacity forecast for payment-api
splunk: memory leak logs

Need to show business impact:

newrelic: revenue impact for payment-api with 150 avg transaction
newrelic: checkout funnel for ecommerce-app
newrelic: sla compliance for payment-api with 1000ms threshold

Debug Templates (14 templates)

These help you debug production issues quickly - usually within 5 minutes you'll know whats happening.

Current state snapshot:

newrelic: current failures for payment-api
# Returns: failure count, failure rate %, P50/P95/P99 latency

Find whats broken:

newrelic: failed endpoints for payment-api
# Shows which endpoints have errors, sorted by worst first

Understand the errors:

newrelic: error types for payment-api
newrelic: error reasons for payment-api
# Get error classes, messages, HTTP status codes

Check performance:

newrelic: latency metrics for payment-api
newrelic: endpoint performance for payment-api
# P50/P95/P99 latency per endpoint with timeline

See overall health:

newrelic: availability for payment-api
newrelic: debug all for payment-api
# Uptime %, success rate, comprehensive metrics

Check dependecies:

newrelic: service calls for payment-api
newrelic: database queries for payment-api
# External service health, slow database queries

P1 Incident Templates (9 templates)

For critical production incidents - these get you to root cause in under 10 mins.

Critical metrics dashboard:

newrelic: critical metrics for payment-api
# Error rate, throughput, latency, apdex - all in one

Error spike investigation:

newrelic: error spike analysis for payment-api
splunk: p1 error timeline in production
# When errors started, error types, timeline

Find affected services:

newrelic: affected endpoints for payment-api
splunk: p1 affected services in production
# Which endpoints failing, error rates

Check external dependancies:

newrelic: external services for payment-api
splunk: p1 dependency failures in production
# Upstream/downstream service health

Database bottlenecks:

newrelic: database performance for payment-api
splunk: p1 database issues in production
# Slow queries, connection issues

Infrastructure problems:

newrelic: host resources
splunk: p1 resource exhaustion in production
# CPU, memory, disk usage

Trace a specific request:

newrelic: trace request 550e8400-e29b-41d4-a716-446655440000
splunk: p1 request tracing in production with trace abc123
# Follow request through all services

Deployment & Release Templates (6 templates)

Most P1 incidents are caused by deployments - these help you validate releases and make rollback decisions in seconds.

Compare before/after deployment:

newrelic: deployment comparison for payment-api before 14:30 and after 14:30
# See if error rate or latency increased after deploy

Check which version has issues:

newrelic: version errors for payment-api
# Compare error rates across versions (v2.3.1 vs v2.3.0)

Validate a rollback worked:

newrelic: rollback validation for payment-api
# Confirms metrics returned to normal after rollback

Monitor canary deployment:

newrelic: canary health for payment-api
# Compare canary vs production metrics to decide if safe to promote

Find recent changes:

newrelic: recent changes for payment-api
splunk: deployment logs
# What changed recently, correlate with error spikes

View deployment history:

newrelic: deployment timeline for payment-api
splunk: version error logs
# Timeline of deployments and their impact

Capacity & Resource Templates (8 templates)

Prevent incidents before they happen - detect memory leaks, pool exhaustion, and capacity issues early.

Detect memory leaks:

newrelic: memory leak detection
splunk: memory leak logs
# Growing memory trend, OutOfMemory errors

Thread pool saturation:

newrelic: thread pool for payment-api
# Active threads, queued tasks, utilization %

Connection pool health:

newrelic: connection pool status for payment-api
splunk: connection pool logs
# DB/cache pool utilization, timeouts

Message queue backlog:

newrelic: queue depth
splunk: queue backlog logs
# Queue depth, publish/consume rates

Rate limiting issues:

newrelic: rate limiting for payment-api
splunk: rate limit logs
# 429 errors, which endpoints rate limited

Autoscaling events:

newrelic: autoscale triggers
# When and why instances scaled up/down

Capacity forecasting:

newrelic: capacity forecast for payment-api
# Project capacity needs for next few hours

Find resource bottlenecks:

newrelic: resource saturation
# Which hosts hitting CPU/memory/disk limits

Business & Revenue Templates (7 templates)

Translate technical issues to business impact - what executives actually care about.

Checkout flow health:

newrelic: checkout funnel for ecommerce-app
splunk: checkout errors
# Success rate at each checkout step, where users drop off

Calculate revenue loss:

newrelic: revenue impact for payment-api with 200 avg transaction
# Estimated $ lost from failed transactions

Critical transaction monitoring:

newrelic: critical transactions for payment-api
splunk: critical transaction logs
# Payment, order, purchase success rates

SLA tracking:

newrelic: sla compliance for payment-api with 1000ms threshold
splunk: sla breach logs
# % meeting SLA, breach timeline

User journey analysis:

newrelic: user journey for ecommerce-app
splunk: user journey logs
# Home > Product > Cart > Checkout > Confirm funnel

Cart abandonment:

newrelic: cart abandonment for ecommerce-app
# Abandonment rate correlated with errors

Conversion impact:

newrelic: conversion impact for ecommerce-app
# Conversion rate vs error rate correlation

Natural Language Queries

The router automatically detects which platform to use based on your query:

These route to New Relic:

"whats the error rate for my application?"
"show me latency metrics"
"deployment comparison for payment-api"
"memory leak detection"
"revenue impact for checkout"

These route to Splunk:

"show me error logs from production"
"deployment logs in the last 2 hours"
"connection pool exhausted errors"
"p1 root cause analysis"

These route to Kubernetes:

"list all pods in production namespace"
"get logs for payment-pod"
"describe service api-gateway"

Template Categories Summary

Category	Templates	Use For	Time to Insight
Debug	14	Current state, errors, latency	30 seconds - 2 mins
P1 Incident	9	Critical outages, spike analysis	5-10 mins to root cause
Deployment	6	Deploy validation, rollback decisions	30 seconds
Capacity	8	Memory leaks, resource planning	2-5 mins
Business	7	Revenue impact, SLA tracking	1-2 mins
Splunk Logs	19	Log analysis, error context	1-3 mins

Total: 55+ production-ready templates

Production Workflows and Decision Trees

P1 Incident Response Workflow

When a critical incident occurs, follow this systematic approach:

Step 1: Immediate Assessment (30 seconds)

newrelic: comprehensive debug for <app>

This gives you the complete picture - error rate, latency, throughput, apdex.

Step 2: Identify What's Broken (1 minute)

newrelic: current failures for <app>
newrelic: failed endpoints for <app>

Now you know which specific endpoints are failing.

Step 3: Understand Why (2 minutes)

newrelic: error reasons for <app>
splunk: p1 root cause in production

Get the specific error messages, HTTP status codes, and stack traces.

Step 4: Check Recent Changes (1 minute)

newrelic: recent changes for <app>
splunk: deployment logs

Was there a recent deployment? Configuration change?

Step 5: Make Decision (1 minute)

If deployment-related: Rollback immediately
If dependency-related: Check external services
If capacity-related: Scale up resources

Total time to root cause: 5-10 minutes

Deployment Validation Workflow

After every deployment, validate health in 30 seconds:

Check impact immediately:

newrelic: deployment comparison for <app> before <deploy-time> and after <deploy-time>

Decision criteria:

Error rate increased >2x → Rollback
Latency increased >2x → Rollback
Error rate increased 1.5-2x → Monitor closely
Metrics similar or better → Deploy successful

Validate the rollback:

newrelic: rollback validation for <app>

Confirms metrics returned to normal.

Memory Leak Investigation

When application performance degrades over time:

Detect the leak (2 minutes):

newrelic: memory leak detection

Look for memory growth >0.5% per minute.

Get error details:

splunk: memory leak logs

Find OutOfMemory errors, heap warnings.

Check related resources:

newrelic: thread pool for <app>
newrelic: connection pool status for <app>

Decision:

Memory growing steadily → Memory leak, restart required
Thread pool exhausted → Increase thread pool or scale
Connection pool saturated → Increase pool size

Business Impact Assessment

When executives ask "how much is this costing us?":

Calculate revenue impact:

newrelic: revenue impact for <app> with <avg-transaction-value> avg transaction

Example output: 500 failed transactions × $150 = $75,000 lost

Show conversion impact:

newrelic: conversion impact for <app>

Compare today's conversion rate vs baseline.

Identify broken user journeys:

newrelic: checkout funnel for <app>
newrelic: user journey for <app>

Where are users dropping off?

Escalation thresholds:

Revenue loss >$10k/hour → Page executives
Checkout success <95% → Critical priority
Conversion rate drop >20% → Immediate investigation

Capacity Planning Workflow

Before major events (Black Friday, product launch):

Forecast capacity needs:

newrelic: capacity forecast for <app>

See current throughput trend and projected load.

Check current resource usage:

newrelic: resource saturation

Which resources will hit limits first?

Validate autoscaling:

newrelic: autoscale triggers

Ensure autoscaling configured properly.

Check connection pools:

newrelic: connection pool status for <app>

Make sure database can handle increased load.

Pre-emptive actions:

Scale up if CPU/memory >70% during normal load
Increase connection pools if utilization >60%
Add read replicas if database queries slow

Key Decision Criteria

When to Rollback a Deployment

Rollback immediately if:

New version error rate >2x old version
New version latency >2x old version
New version apdex score <0.5
Critical transaction success rate <95%
Revenue-impacting errors detected

Monitor closely if:

Error rate increased 1.5-2x (prepare for rollback)
Latency increased 1.5-2x (investigate)
Minor increase in errors but not critical endpoints

Capacity Alert Thresholds

Critical (take action now):

CPU >90%
Memory >90% or growing >0.5%/minute
Thread pool utilization >90%
Connection pool utilization >95%
Queue depth growing continuously

Warning (plan action):

CPU 75-90%
Memory 75-90%
Thread pool 75-90%
Connection pool 85-95%
Queue depth stable but high

Business Impact Escalation

Critical severity (page executives):

Revenue loss >$50k/hour
Checkout success <90%
Payment success <95%
Major customer impact

High severity (page senior engineers):

Revenue loss $10k-50k/hour
Checkout success 90-95%
Payment success 95-98%
Multiple customers affected

Medium severity (standard response):

Revenue loss <$10k/hour
Checkout success >95%
Payment success >98%
Limited customer impact

Quick Reference Commands

Deployment Commands

# Compare metrics before/after deploy
newrelic: deployment comparison for payment-api before 14:30 and after 14:30

# Check error rates by version
newrelic: version errors for payment-api

# Validate rollback restored health
newrelic: rollback validation for payment-api

# Monitor canary deployment
newrelic: canary health for payment-api

# See what changed recently
newrelic: recent changes for payment-api
splunk: deployment logs

Debug Commands

# Quick health check
newrelic: comprehensive debug for payment-api

# Current failure metrics
newrelic: current failures for payment-api

# Which endpoints broken
newrelic: failed endpoints for payment-api

# Why they're failing
newrelic: error reasons for payment-api
newrelic: error types for payment-api

# Performance analysis
newrelic: latency metrics for payment-api
newrelic: endpoint performance for payment-api

# Dependency health
newrelic: service calls for payment-api
newrelic: database queries for payment-api

Capacity Commands

# Detect memory leaks
newrelic: memory leak detection
splunk: memory leak logs

# Check resource usage
newrelic: resource saturation
newrelic: thread pool for payment-api
newrelic: connection pool status for payment-api

# Queue analysis
newrelic: queue depth
splunk: queue backlog logs

# Capacity planning
newrelic: capacity forecast for payment-api
newrelic: autoscale triggers

# Rate limiting
newrelic: rate limiting for payment-api
splunk: rate limit logs

Business Impact Commands

# Revenue calculation
newrelic: revenue impact for payment-api with 150 avg transaction

# Funnel analysis
newrelic: checkout funnel for ecommerce-app
newrelic: user journey for ecommerce-app

# Transaction monitoring
newrelic: critical transactions for payment-api
splunk: critical transaction logs

# SLA tracking
newrelic: sla compliance for payment-api with 1000ms threshold
splunk: sla breach logs

# Conversion analysis
newrelic: conversion impact for ecommerce-app
newrelic: cart abandonment for ecommerce-app

P1 Incident Commands

# Initial assessment
newrelic: critical metrics for payment-api
newrelic: comprehensive debug for payment-api

# Error investigation
newrelic: error spike analysis for payment-api
splunk: p1 error timeline in production

# Service health
newrelic: affected endpoints for payment-api
splunk: p1 affected services in production

# Dependency check
newrelic: external services for payment-api
splunk: p1 dependency failures in production

# Infrastructure
newrelic: host resources
splunk: p1 resource exhaustion in production

# Request tracing
newrelic: trace request <trace-id>
splunk: p1 request tracing in production with trace <trace-id>

Common Issues and Solutions

Issue: Application Slow After Deployment

Diagnosis:

newrelic: deployment comparison for <app> before <time> and after <time>
newrelic: latency metrics for <app>
newrelic: database queries for <app>

Common causes:

New code introduced N+1 queries
Database connection pool too small
External API calls not optimized
Memory leak causing GC pressure

Solution:

Rollback if latency >2x baseline
Check database query performance
Increase connection pool if saturated
Monitor memory growth

Issue: Intermittent Errors

Diagnosis:

newrelic: error timeline for <app>
newrelic: error reasons for <app>
splunk: p1 error timeline in production

Common causes:

Connection pool exhaustion (errors every N minutes)
Rate limiting from external service
Memory pressure causing timeouts
Load balancer health check failures

Solution:

Check connection pool utilization
Review rate limiting logs
Verify autoscaling thresholds
Check load balancer configuration

Issue: Checkout Failures

Diagnosis:

newrelic: checkout funnel for ecommerce-app
newrelic: revenue impact for payment-api with <avg-value> avg transaction
splunk: checkout errors

Common causes:

Payment gateway timeout
Session expiration
Inventory service unavailable
Database connection issues

Solution:

Check payment gateway health
Verify session timeout settings
Test inventory service
Scale database connections

Issue: High Memory Usage

Diagnosis:

newrelic: memory leak detection
newrelic: thread pool for <app>
splunk: memory leak logs

Common causes:

Memory leak in application code
Connection leak (unclosed connections)
Large object retention
Thread leak

Solution:

Restart affected instances
Heap dump analysis
Review recent code changes
Check connection pool settings

Template Parameters Reference

Common parameters used across templates:

app_name (required in most queries)

Your application name in New Relic
Example: payment-api, ecommerce-app, user-service

time_range (optional, varies by template)

New Relic: "5 minutes ago", "1 hour ago", "1 day ago"
Splunk: "-5m", "-1h", "-24h"
Default varies by template (5min for current state, 1h for analysis)

deployment_time (required for deployment comparison)

Epoch timestamp or ISO format
Example: 1642604400 or "14:30"

avg_transaction_value (optional, for revenue impact)

Average dollar value per transaction
Example: 100, 150, 200
Default: 100

sla_threshold (optional, for SLA compliance)

Response time threshold in milliseconds
Example: 500, 1000, 2000
Default: 1000

cpu_threshold / memory_threshold

Resource utilization percentage
Example: 80, 85, 90
Defaults: cpu=80, memory=85

error_threshold (optional, for filtering)

Error rate percentage threshold
Example: 1, 5, 10
Default: 1 (shows endpoints with >1% errors)

slow_threshold (optional, for database queries)

Query duration threshold in milliseconds
Example: 100, 500, 1000
Default: 100

Tips for Effective Usage

Tip 1: Combine New Relic + Splunk

New Relic tells you WHAT is happening (metrics)
Splunk tells you WHY (logs, errors, stack traces)
Always use both for complete root cause analysis

Tip 2: Establish Baselines

Know your normal error rate (e.g., 0.5%)
Know your normal p95 latency (e.g., 300ms)
Know your normal conversion rate (e.g., 3.2%)
Deviations become obvious when you have baselines

Tip 3: Run Deployment Checks Automatically After every deploy:

Wait 5 minutes for metrics to stabalize
Run deployment comparison
Automatic rollback if error rate >2x

Tip 4: Use Time Ranges Strategically

Current state: 5 minutes (what's happening now)
Troubleshooting: 30 minutes - 1 hour (enough context)
Analysis: 1-4 hours (see patterns)
Capacity planning: 4-24 hours (trends)

Tip 5: Start Broad, Then Focus

Start with comprehensive debug (all metrics)
Identify problem area (errors vs latency vs capacity)
Use specific template for deep dive
Get logs from Splunk for details

Tip 6: Revenue First for Executives When reporting to executives:

Lead with business impact ($75k revenue loss)
Then technical details (payment gateway timeout)
Then action plan (switching to backup gateway)
Avoid technical jargon in initial report

Architecture

┌────────────────────────────────────────────────────────┐
│                     MCP Server                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │              Intent Router                      │   │
│  │   (Classifies queries → platforms)              │   │
│  └─────────────────────────────────────────────────┘   │
│                          │                             │
│         ┌────────────────┼────────────────┐            │
│         ▼                ▼                ▼            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
│  │  NerdGraph  │  │   Splunk    │  │   kubectl   │     │
│  │   Adapter   │  │   Adapter   │  │   Adapter   │     │
│  └─────────────┘  └─────────────┘  └─────────────┘     │
│         │                │                │            │
│         ▼                ▼                ▼            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
│  │  GraphQL    │  │  REST API   │  │    CLI      │     │
│  │    API      │  │             │  │  Subprocess │     │
│  └─────────────┘  └─────────────┘  └─────────────┘     │
└────────────────────────────────────────────────────────┘

Development

Running Tests

pytest tests/

Code Formatting

ruff check src/
ruff format src/

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
apis		apis
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
demo_enhancements.py		demo_enhancements.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
setup.bat		setup.bat
setup.ps1		setup.ps1
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

ProdHelp - Multi-Platform Observability MCP Server

Supported Platforms

Key Features

Installation

Quick Setup (Recommended)

Manual Setup

Configuration

New Relic

Splunk

Kubernetes

Custom APIs

Usage

Running the MCP Server

MCP Client Configuration

Real-World Usage Examples

Common Workflows

Tools

query

newrelic_query

splunk_query

kubectl_command

list_templates

get_schema

Custom APIs

Adding a Custom API

Custom API Tools

Custom API JSON Format

Example Queries

Quick Start Examples

Debug Templates (14 templates)

P1 Incident Templates (9 templates)

Deployment & Release Templates (6 templates)

Capacity & Resource Templates (8 templates)

Business & Revenue Templates (7 templates)

Natural Language Queries

Template Categories Summary

Production Workflows and Decision Trees

P1 Incident Response Workflow

Deployment Validation Workflow

Memory Leak Investigation

Business Impact Assessment

Capacity Planning Workflow

Key Decision Criteria

When to Rollback a Deployment

Capacity Alert Thresholds

Business Impact Escalation

Quick Reference Commands

Deployment Commands

Debug Commands

Capacity Commands

Business Impact Commands

P1 Incident Commands

Common Issues and Solutions

Issue: Application Slow After Deployment

Issue: Intermittent Errors

Issue: Checkout Failures

Issue: High Memory Usage

Template Parameters Reference

Tips for Effective Usage

Architecture

Development

Running Tests

Code Formatting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`query`

`newrelic_query`

`splunk_query`

`kubectl_command`

`list_templates`

`get_schema`

Packages