Netdata.cloud bot for Zulip
Python 74.9%
Shell 14.4%
Dockerfile 2.6%
Other 8.1%
5 1 0

Clone this repository

https://tangled.org/anil.recoil.org/zulip-netdata-bot
git@git.recoil.org:anil.recoil.org/zulip-netdata-bot

For self-hosted knots, clone URLs may differ based on your setup.

README.md

Netdata Zulip Bot#

100% vibe coded, use at your peril

A webhook service that receives notifications from Netdata Cloud and forwards them to Zulip channels. Features HTTPS with Let's Encrypt certificates and mutual TLS authentication for secure communication with Netdata Cloud.

Features#

  • 🔐 Automated SSL Certificates: Built-in Let's Encrypt integration with automatic renewal
  • 🤝 Mutual TLS: Secure authentication with Netdata Cloud
  • 📊 Rich Formatting: Beautiful Zulip messages with emojis and markdown
  • 🏷️ Topic Organization: Automatic topic routing by severity level
  • 📝 Structured Logging: JSON-structured logs for monitoring
  • High Performance: FastAPI-based webhook endpoint
  • 🚀 Standalone: No external dependencies like certbot required

Quick Start#

1. Install Dependencies#

# Using uv (recommended)
uv sync

# Or using pip
pip install -e .

2. Create Configuration#

# Generate sample configuration files
netdata-zulip-bot --create-config

# Copy and customize
cp .zuliprc.sample ~/.zuliprc

3. Configure Zulip Settings#

Edit ~/.zuliprc:

[api]
site=https://yourorg.zulipchat.com
email=netdata-bot@yourorg.zulipchat.com  
key=your-zulip-api-key
stream=netdata-alerts

4. Set Server Environment Variables#

export SERVER_DOMAIN=your-webhook-domain.com
export SERVER_PORT=8443
export SERVER_ENABLE_MTLS=true

# For automated SSL certificates (recommended)
export SERVER_AUTO_CERT=true
export SERVER_CERT_EMAIL=admin@example.com
# Use staging for testing (optional)
export SERVER_CERT_STAGING=false

5. Run the Service#

# With automated SSL certificates
netdata-zulip-bot

# The bot will automatically:
# 1. Obtain SSL certificates from Let's Encrypt
# 2. Start the HTTPS server
# 3. Renew certificates before expiration

Configuration#

Zulip Configuration#

The bot supports two configuration methods:

Create ~/.zuliprc:

[api]
site=https://yourorg.zulipchat.com
email=netdata-bot@yourorg.zulipchat.com
key=your-zulip-api-key
stream=netdata-alerts

Method 2: Environment Variables#

export ZULIP_SITE=https://yourorg.zulipchat.com
export ZULIP_EMAIL=netdata-bot@yourorg.zulipchat.com
export ZULIP_API_KEY=your-api-key
export ZULIP_STREAM=netdata-alerts

Use --env-config flag to use environment variables instead of zuliprc.

Server Configuration#

Set these environment variables:

  • SERVER_DOMAIN: Your public domain (required)
  • SERVER_HOST: Bind address (default: 0.0.0.0)
  • SERVER_PORT: HTTPS port (default: 8443)
  • SERVER_ENABLE_MTLS: Enable mutual TLS (default: true)
  • SERVER_AUTO_CERT: Enable automatic certificate management (default: false)
  • SERVER_CERT_EMAIL: Email for Let's Encrypt account (required when auto_cert is true)
  • SERVER_CERT_PATH: Directory for storing certificates (default: ./certs)
  • SERVER_CERT_STAGING: Use Let's Encrypt staging server for testing (default: false)
  • SERVER_ACME_PORT: Port for ACME HTTP-01 challenge (default: 80)

Manual SSL Configuration#

If not using automated certificates:

  • SERVER_CERT_PATH: Path to certificate directory
  • Place fullchain.pem and privkey.pem in {SERVER_CERT_PATH}/{SERVER_DOMAIN}/

Message Format#

Alert Notifications#

Messages are posted to topics based on severity level:

  • Topic: critical, warning, or clear
  • Format: Rich markdown with alert details, timestamps, and links

Example:

🔴 **High CPU Usage**

**Space:** production
**Chart:** system.cpu
**Context:** cpu utilization
**Severity:** Critical
**Time:** 2024-01-15 14:30:00 UTC

**Details:** CPU usage has exceeded 90% for 5 minutes
**Summary:** Critical alert: High CPU usage detected

[View Alert](https://app.netdata.cloud/spaces/...)

Reachability Notifications#

Messages are posted to the reachability topic:

❌ **Host Unreachable**

**Host:** web-server-01
**Status:** ❌ Unreachable  
**Severity:** Critical

**Summary:** Host web-server-01 is no longer reachable

[View Host](https://app.netdata.cloud/...)

Deployment#

Systemd Service#

Create /etc/systemd/system/netdata-zulip-bot.service:

[Unit]
Description=Netdata Zulip Bot
After=network.target

[Service]
Type=simple
User=netdata-bot
WorkingDirectory=/opt/netdata-zulip-bot
Environment=SERVER_DOMAIN=your-domain.com
ExecStart=/opt/netdata-zulip-bot/venv/bin/netdata-zulip-bot
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable netdata-zulip-bot
sudo systemctl start netdata-zulip-bot

Docker#

FROM python:3.11-slim

WORKDIR /app
COPY . .
RUN pip install -e .

EXPOSE 8443

CMD ["netdata-zulip-bot"]

Security#

SSL Certificate Management#

The bot includes fully automated SSL certificate management:

  1. Automatic Provisioning: Obtains certificates from Let's Encrypt on first run
  2. Automatic Renewal: Checks daily and renews certificates 30 days before expiration
  3. Zero Downtime: Certificate renewal happens in the background
  4. ACME HTTP-01 Challenge: Built-in challenge server (requires port 80 access)

Mutual TLS Authentication#

The service supports mutual TLS to authenticate Netdata Cloud webhooks:

  1. Server Certificate: Automatically managed via built-in ACME client
  2. Client Verification: Validates Netdata's client certificate
  3. CA Certificate: Built-in Netdata CA certificate for client validation

Webhook Endpoint Security#

  • HTTPS-only communication
  • Request logging and monitoring
  • Payload validation and sanitization
  • Error handling without information disclosure

Monitoring#

The service provides structured JSON logging for easy monitoring:

{
  "timestamp": "2024-01-15T14:30:00.000Z",
  "level": "info",
  "event": "Message sent to Zulip",
  "stream": "netdata-alerts",
  "topic": "critical",
  "message_id": 12345
}

Health Check#

curl -k https://your-domain.com:8443/health

Response:

{
  "status": "healthy",
  "service": "netdata-zulip-bot"
}

Development#

Running Tests#

pytest

Code Formatting#

black .
ruff check .

Local Development#

For development, you can disable HTTPS and mTLS:

export SERVER_ENABLE_MTLS=false
# Use HTTP for testing (not recommended for production)

Troubleshooting#

Common Issues#

  1. Certificate Issues

    • For automated certs: Ensure port 80 is accessible for ACME challenges
    • Domain must point to your server's IP address
    • Check SERVER_CERT_EMAIL is set for auto-cert mode
    • Use SERVER_CERT_STAGING=true for testing to avoid rate limits
  2. Zulip Connection Failed

    • Verify API credentials in zuliprc
    • Test connection with Zulip's API
  3. Webhook Not Receiving Data

    • Check firewall settings for port 8443
    • Verify domain DNS resolution
    • Check Netdata Cloud webhook configuration

Logs#

View service logs:

sudo journalctl -u netdata-zulip-bot -f

License#

MIT License - see LICENSE file for details.