A powerful, enterprise-grade uptime monitoring system with granular notification control, group-based monitoring strategies, custom metrics tracking, and comprehensive alerting across multiple channels.
- π Pulse-Based Monitoring - Receive heartbeat signals from your services
- π Group-Based Strategies - Organize monitors into hierarchical groups with flexible health strategies
- π Custom Metrics - Track up to 3 custom decimal values per monitor (e.g., player count, memory usage, CPU load)
- π Granular Notifications - Channel-based notification system with per-monitor/group control
- π Real-Time Status Pages - Multiple customizable status pages for different audiences
- β‘ High Performance - Built with Bun and ClickHouse for maximum throughput
- π Production Ready - Comprehensive validation, error handling, and monitoring
| Project | Description |
|---|---|
| UptimeMonitor-StatusPage | Self-hosted status page frontend with real-time updates |
| PulseMonitor | Automated pulse sending client |
Note: This repository contains only the backend server. To display a public status page, you'll also need to deploy UptimeMonitor-StatusPage.
-
Clone the repository and navigate to the project directory:
git clone https://github.com/Rabbit-Company/UptimeMonitor-Server.git cd UptimeMonitor-Server -
Edit your configuration file:
Edit
config.tomlwith your monitors, notification channels, and other settings (see Configuration below). -
Start the services:
docker compose up -d
This will start:
- Uptime Monitor on port
3000 - ClickHouse database (internal, not exposed)
- Uptime Monitor on port
-
Verify the deployment:
# Check service status docker compose ps # View logs docker compose logs -f uptimemonitor # Test the health endpoint curl http://localhost:3000/health
# Database Configuration
[clickhouse]
url = "http://uptime_user:uptime_password@localhost:8123/uptime_monitor"
# Server Configuration
[server]
port = 3000
# Available options: "direct" (no proxy), "cloudflare", "aws", "gcp", "azure", "vercel", "nginx", "development"
proxy = "direct"
# Optional: Set your own reload token. If not provided, one will be auto-generated at startup.
#reloadToken = ""
# Logging Configuration
[logger]
level = 4 # 0=silent, 7=verbose
[missingPulseDetector]
# Check interval in seconds for detecting missing pulses (default: 5)
# Lower values detect outages faster but increase CPU usage
interval = 5
[selfMonitoring]
# Enable self-monitoring and automatic backfill
enabled = true
# ID of the self-monitor
id = "self-monitor"
# Health check interval in seconds (default: 3)
# Lower values detect outages faster but increase database load
interval = 3
# Backfill synthetic pulses for monitors that were healthy before downtime
# This prevents false downtime reports when the monitoring system itself is down
backfillOnRecovery = true
# Strategy for synthetic pulse latency:
# - "last-known": Use the last latency from before the downtime
# - "null": Don't set latency for synthetic pulses
latencyStrategy = "last-known"
# Monitor Definition
[[monitors]]
id = "api-prod"
name = "Production API"
token = "secure-random-token"
interval = 60 # Expect pulse every 60 seconds
maxRetries = 3
resendNotification = 12
notificationChannels = ["critical"]
# Monitor with Custom Metrics
[[monitors]]
id = "game-server"
name = "Game Server"
token = "tk_game_server_xyz999"
interval = 10
maxRetries = 0
resendNotification = 12
groupId = "production"
notificationChannels = []
# Define custom metric 1
[monitors.custom1]
id = "players" # Used in API requests as query parameter name
name = "Player Count" # Human-readable display name
unit = "players" # Optional unit for display
# Define custom metric 2
[monitors.custom2]
id = "tps"
name = "Ticks Per Second"
unit = "TPS"
# Define custom metric 3
[monitors.custom3]
id = "memory"
name = "Memory Usage"
unit = "MB"
# Group Definition
[[groups]]
id = "production"
name = "Production Services"
strategy = "any-up" # UP if any child is up
degradedThreshold = 90
notificationChannels = ["critical"]
# Status Page
[[status_pages]]
id = "public"
name = "Service Status"
slug = "status"
items = ["production"]
# Notifications
# Critical Discord Notifications
[notifications.channels.critical]
id = "critical"
name = "Critical Production Alerts"
enabled = true
[notifications.channels.critical.discord]
enabled = true
webhookUrl = "https://discord.com/api/webhooks/YOUR_WEBHOOK_URL"
username = "π¨ Critical Alert Bot"
[notifications.channels.critical.discord.mentions]
everyone = true
roles = ["187949199596191745"] # @DevOps role ID
# Critical Email Notifications
[notifications.channels.critical.email]
enabled = true
from = '"Rabbit Company" <info@rabbit-company.com>'
to = [""]
[notifications.channels.critical.email.smtp]
host = "mail.rabbit-company.com"
port = 465
secure = true
user = "info@rabbit-company.com"
pass = ""The pulse endpoint (/v1/push/:token) accepts the following optional query parameters:
latency- Response time in milliseconds (capped at 600000ms/10 minutes)startTime- When the check started (ISO format or Unix timestamp)endTime- When the check completed (ISO format or Unix timestamp)- Custom metric parameters - Use the configured
idvalue for each custom metric
- If both
startTimeandendTimeare provided, latency is calculated automatically - If
startTimeandlatencyare provided,endTimeis calculated - If
endTimeandlatencyare provided,startTimeis calculated - If only
latencyis provided,endTimeis set to current time andstartTimeis calculated - If no parameters are provided, the pulse is recorded with the current timestamp
# Simple pulse with no timing data
curl -X GET http://localhost:3000/v1/push/:token
# Send a pulse with latency (milliseconds)
curl -X GET http://localhost:3000/v1/push/:token?latency=125.5
# RECOMMENDED: Send all three parameters for maximum accuracy
curl -X GET http://localhost:3000/v1/push/:token?startTime=2025-10-15T10:00:00Z&endTime=2025-10-15T10:00:01.500Z&latency=1500
# Send a pulse with start and end times (latency calculated automatically)
curl -X GET http://localhost:3000/v1/push/:token?startTime=2025-10-15T10:00:00Z&endTime=2025-10-15T10:00:01Z
# Send a pulse with Unix timestamps
curl -X GET http://localhost:3000/v1/push/:token?startTime=1736928000000&endTime=1736928001500
# Send a pulse with start time and latency (end time calculated)
curl -X GET http://localhost:3000/v1/push/:token?startTime=2025-10-15T10:00:00Z&latency=1500Custom metrics allow you to track additional numeric values alongside your pulses. Each monitor can have up to 3 custom metrics configured.
Define custom metrics in your monitor configuration:
[[monitors]]
id = "game-server"
name = "Game Server"
token = "tk_game_server_xyz999"
interval = 10
maxRetries = 0
resendNotification = 12
[monitors.custom1]
id = "players" # Query parameter name
name = "Player Count" # Display name
unit = "players" # Optional unit
[monitors.custom2]
id = "tps"
name = "Ticks Per Second"
unit = "TPS"
[monitors.custom3]
id = "memory"
name = "Memory Usage"
unit = "MB"Use the configured id as the query parameter name:
# Send pulse with player count
curl -X GET "http://localhost:3000/v1/push/tk_game_server_xyz999?players=30"
# Send pulse with latency and all custom metrics
curl -X GET "http://localhost:3000/v1/push/tk_game_server_xyz999?latency=50&players=30&tps=19.8&memory=2048.5"
# You can also use generic names (custom1, custom2, custom3)
curl -X GET "http://localhost:3000/v1/push/tk_game_server_xyz999?custom1=30&custom2=19.8&custom3=2048.5"Custom metrics are included in status page and history responses:
{
"id": "game-server",
"type": "monitor",
"name": "Game Server",
"status": "up",
"latency": 50,
"custom1": {
"config": {
"id": "players",
"name": "Player Count",
"unit": "players"
},
"value": 30
},
"custom2": {
"config": {
"id": "tps",
"name": "Ticks Per Second",
"unit": "TPS"
},
"value": 19.8
},
"custom3": {
"config": {
"id": "memory",
"name": "Memory Usage",
"unit": "MB"
},
"value": 2048.5
}
}Custom metrics are aggregated in history data with min, max, and avg values:
{
"monitorId": "game-server",
"type": "hourly",
"customMetrics": {
"custom1": { "id": "players", "name": "Player Count", "unit": "players" },
"custom2": { "id": "tps", "name": "Ticks Per Second", "unit": "TPS" },
"custom3": { "id": "memory", "name": "Memory Usage", "unit": "MB" }
},
"data": [
{
"timestamp": "2025-01-08T14:00:00Z",
"uptime": 100,
"latency_min": 45,
"latency_max": 120,
"latency_avg": 67.5,
"custom1_min": 10,
"custom1_max": 150,
"custom1_avg": 45.3,
"custom2_min": 18.5,
"custom2_max": 20.0,
"custom2_avg": 19.7,
"custom3_min": 1800,
"custom3_max": 2500,
"custom3_avg": 2100.5
}
]
}- Game Servers: Track player count, TPS, memory usage
- Database Servers: Track active connections, query rate, replication lag
- Web Servers: Track request rate, error rate, queue depth
- CDN/Cache: Track hit rate, bandwidth, origin requests
- Message Queues: Track queue depth, consumer lag, throughput
- IoT Devices: Track sensor readings, battery level, signal strength
For automated pulse sending we recommend using PulseMonitor.