Netdata.cloud bot for Zulip
1# Netdata Zulip Bot
2
3A production-ready webhook service that receives notifications from Netdata Cloud and forwards them to Zulip channels. Designed to run behind a reverse proxy (like Caddy) that handles HTTPS and mutual TLS authentication.
4
5## Features
6
7- 🔗 **Reverse Proxy Ready**: HTTP service designed to run behind Caddy/nginx
8- 🤝 **Mutual TLS Support**: When configured with reverse proxy
9- 📊 **Rich Formatting**: Beautiful Zulip messages with emojis and markdown
10- 🏷️ **Topic Organization**: Automatic topic routing by severity level
11- 📝 **Structured Logging**: JSON-structured logs for monitoring
12- ⚡ **High Performance**: FastAPI-based webhook endpoint
13- 🔧 **Flexible Configuration**: Support for .zuliprc files or environment variables
14- ✅ **Webhook Verification**: Built-in Netdata challenge/response handling
15
16## Quick Start
17
18### 1. Install Dependencies
19
20```bash
21# Using uv (recommended)
22uv sync
23```
24
25### 2. Create Configuration
26
27```bash
28# Generate sample configuration files
29uv run netdata-zulip-bot --create-config
30
31# Copy and customize
32cp .zuliprc.sample ~/.zuliprc
33cp .env.sample .env
34```
35
36### 3. Configure Zulip Settings
37
38Edit `~/.zuliprc`:
39
40```ini
41[api]
42site=https://yourorg.zulipchat.com
43email=netdata-bot@yourorg.zulipchat.com
44key=your-zulip-api-key
45stream=netdata-alerts
46```
47
48### 4. Configure Environment Variables
49
50Edit `.env` file or set environment variables:
51
52```bash
53# Server configuration (HTTP only)
54export SERVER_HOST=0.0.0.0
55export SERVER_PORT=8080
56
57# Required: Netdata webhook challenge secret
58export SERVER_CHALLENGE_SECRET=your-challenge-secret-here
59
60# Optional: Override Zulip stream
61export ZULIP_STREAM=netdata-alerts
62```
63
64### 5. Run the Service
65
66```bash
67# Start the HTTP service
68uv run netdata-zulip-bot
69
70# Or with custom configuration
71uv run netdata-zulip-bot --zuliprc /path/to/.zuliprc
72
73# The service runs on HTTP (default: localhost:8080)
74# Use a reverse proxy like Caddy for HTTPS and mutual TLS
75```
76
77## Configuration
78
79### Zulip Configuration
80
81The bot supports two configuration methods:
82
83#### Method 1: Zuliprc File (Recommended)
84
85Create `~/.zuliprc`:
86
87```ini
88[api]
89site=https://yourorg.zulipchat.com
90email=netdata-bot@yourorg.zulipchat.com
91key=your-zulip-api-key
92stream=netdata-alerts
93```
94
95#### Method 2: Environment Variables
96
97```bash
98export ZULIP_SITE=https://yourorg.zulipchat.com
99export ZULIP_EMAIL=netdata-bot@yourorg.zulipchat.com
100export ZULIP_API_KEY=your-api-key
101export ZULIP_STREAM=netdata-alerts
102```
103
104Use the `--env-config` flag to use environment variables instead of zuliprc:
105
106```bash
107uv run netdata-zulip-bot --env-config
108```
109
110### Server Configuration
111
112Set these environment variables:
113
114- `SERVER_HOST`: Bind address (default: `0.0.0.0`)
115- `SERVER_PORT`: HTTP port (default: `8080`)
116- `SERVER_CHALLENGE_SECRET`: Netdata webhook challenge secret (required)
117
118### Reverse Proxy Setup
119
120The bot is designed to run behind a reverse proxy that handles HTTPS and mutual TLS:
121
122#### Using Caddy (Recommended)
123
1241. Update `Caddyfile` with your domain name
1252. Place Netdata CA certificate in `netdata-ca.pem`
1263. Run both services:
127
128```bash
129# Start the bot
130uv run netdata-zulip-bot &
131
132# Start Caddy
133caddy run --config Caddyfile
134```
135
136#### Using Docker Compose
137
138```bash
139docker-compose up -d
140```
141
142## Message Format
143
144### Alert Notifications
145
146Messages are posted to topics based on severity level:
147
148- **Topic**: `critical`, `warning`, or `clear`
149- **Format**: Rich markdown with alert details, timestamps, and links
150
151Example:
152```
153🔴 **High CPU Usage**
154
155**Space:** production
156**Chart:** system.cpu
157**Context:** cpu utilization
158**Severity:** Critical
159**Time:** 2024-01-15 14:30:00 UTC
160
161**Details:** CPU usage has exceeded 90% for 5 minutes
162
163**Summary:** Critical alert: High CPU usage detected
164
165[View Alert](https://app.netdata.cloud/spaces/...)
166```
167
168### Reachability Notifications
169
170Messages are posted to the `reachability` topic:
171
172```
173❌ **Host Unreachable**
174
175**Host:** web-server-01
176**Status:** ❌ Unreachable
177**Severity:** Critical
178
179**Summary:** Host web-server-01 is no longer reachable
180
181[View Host](https://app.netdata.cloud/...)
182```
183
184## Deployment
185
186### Systemd Service
187
188See `examples/netdata-zulip-bot.service` for a complete systemd service configuration.
189
190### Automated Setup
191
192Use the provided setup script:
193
194```bash
195sudo ./scripts/setup.sh --domain your-domain.com --email admin@example.com
196```
197
198### Docker
199
200The included `Dockerfile` and `docker-compose.yml` provide a complete setup with Caddy reverse proxy:
201
202```bash
203docker-compose up -d
204```
205
206## Security
207
208### Architecture
209
210The bot uses a security-focused architecture:
211
2121. **HTTP Backend**: Simple HTTP service with no direct internet exposure
2132. **Reverse Proxy**: Caddy handles HTTPS, certificates, and client authentication
2143. **Mutual TLS**: Client certificate validation at the reverse proxy level
215
216### Webhook Security
217
218- **Challenge/Response**: Built-in Netdata webhook verification using HMAC-SHA256
219- **Payload Validation**: Strict payload parsing and validation
220- **Request Logging**: Comprehensive logging of all webhook requests
221- **Error Handling**: Secure error responses without information disclosure
222
223### SSL Certificate Management
224
225SSL certificates are managed by the reverse proxy (Caddy):
226
2271. **Automatic Provisioning**: Caddy obtains Let's Encrypt certificates
2282. **Automatic Renewal**: Built-in certificate renewal
2293. **Mutual TLS**: Client certificate validation using Netdata CA certificate
230
231## Monitoring
232
233The service provides structured JSON logging for easy monitoring:
234
235```json
236{
237 "timestamp": "2024-01-15T14:30:00.000Z",
238 "level": "info",
239 "event": "Message sent to Zulip",
240 "stream": "netdata-alerts",
241 "topic": "critical",
242 "message_id": 12345
243}
244```
245
246### Health Check
247
248```bash
249# Direct HTTP check (backend service)
250curl http://localhost:8080/health
251
252# Through reverse proxy
253curl https://your-domain.com/health
254```
255
256Response:
257```json
258{
259 "status": "healthy",
260 "service": "netdata-zulip-bot"
261}
262```
263
264## Development
265
266### Running Tests
267
268```bash
269uv run python -m pytest tests/ -v
270```
271
272### Code Formatting
273
274```bash
275uv run black .
276uv run ruff check .
277```
278
279### Local Development
280
281For development, you can run the HTTP service directly:
282
283```bash
284# Set required environment variables
285export SERVER_CHALLENGE_SECRET=test-secret
286
287# Run the service
288uv run netdata-zulip-bot
289
290# Test webhook endpoint
291curl -X POST http://localhost:8080/webhook/netdata?crc_token=test123
292```
293
294## Troubleshooting
295
296### Common Issues
297
2981. **Configuration Issues**
299 - Ensure `SERVER_CHALLENGE_SECRET` is set (required for Netdata webhook verification)
300 - Verify `.zuliprc` file contains all required fields
301 - Check that Zulip bot has permission to post to the configured stream
302
3032. **Reverse Proxy Issues**
304 - Ensure Caddy configuration uses correct domain name
305 - Verify Netdata CA certificate is properly configured
306 - Check that port 80 is accessible for Let's Encrypt challenges
307
3083. **Webhook Not Receiving Data**
309 - Verify Netdata Cloud webhook URL points to your reverse proxy
310 - Check webhook challenge secret matches configuration
311 - Review service logs for error messages
312
313### Logs
314
315View service logs:
316```bash
317sudo journalctl -u netdata-zulip-bot -f
318```
319
320## License
321
322MIT License - see LICENSE file for details.